diff mbox series

[2/2] PCI: Skip resource distribution when no hotplug bridges

Message ID 20190622210310.180905-3-helgaas@kernel.org
State Accepted
Delegated to: Bjorn Helgaas
Headers show
Series PCI: Simplify pci_bus_distribute_available_resources() | expand

Commit Message

Bjorn Helgaas June 22, 2019, 9:03 p.m. UTC
From: Bjorn Helgaas <bhelgaas@google.com>

If "hotplug_bridges == 0", "!dev->is_hotplug_bridge" is always true, so the
loop that divides the remaining resources among hotplug-capable bridges
does nothing.

Check for "hotplug_bridges == 0" earlier, so we don't even have to compute
the amount of remaining resources.  No functional change intended.

---

I'm pretty sure this patch preserves the previous behavior of
pci_bus_distribute_available_resources(), but I'm not sure that
behavior is what we want.

For example, in the following topology, when we process bus 10, we
find two non-hotplug bridges and no hotplug bridges, so IIUC we return
without distributing any resources to them.  But I would think we
should try to give 10:1c.0 more space if possible because it has a
hotplug bridge below it.

  00:1c.0: hotplug bridge to [bus 10-2f]
    10:1c.0: non-hotplug bridge to [bus 11-2e]
      11:00.0: hotplug bridge to [bus 12-2e]
    10:1c.1: non-hotplug bridge to [bus 2f]
---
 drivers/pci/setup-bus.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

Comments

Mika Westerberg June 24, 2019, 11:24 a.m. UTC | #1
On Sat, Jun 22, 2019 at 04:03:11PM -0500, Bjorn Helgaas wrote:
> From: Bjorn Helgaas <bhelgaas@google.com>
> 
> If "hotplug_bridges == 0", "!dev->is_hotplug_bridge" is always true, so the
> loop that divides the remaining resources among hotplug-capable bridges
> does nothing.
> 
> Check for "hotplug_bridges == 0" earlier, so we don't even have to compute
> the amount of remaining resources.  No functional change intended.
> 
> ---
> 
> I'm pretty sure this patch preserves the previous behavior of
> pci_bus_distribute_available_resources(), but I'm not sure that
> behavior is what we want.
> 
> For example, in the following topology, when we process bus 10, we
> find two non-hotplug bridges and no hotplug bridges, so IIUC we return
> without distributing any resources to them.  But I would think we
> should try to give 10:1c.0 more space if possible because it has a
> hotplug bridge below it.
> 
>   00:1c.0: hotplug bridge to [bus 10-2f]
>     10:1c.0: non-hotplug bridge to [bus 11-2e]
>       11:00.0: hotplug bridge to [bus 12-2e]
>     10:1c.1: non-hotplug bridge to [bus 2f]

Yes, I agree in this case we want to preserve more space for 10:1c.0.

For this patch,

Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Logan Gunthorpe June 24, 2019, 4:26 p.m. UTC | #2
On 2019-06-22 3:03 p.m., Bjorn Helgaas wrote:
> From: Bjorn Helgaas <bhelgaas@google.com>
> 
> If "hotplug_bridges == 0", "!dev->is_hotplug_bridge" is always true, so the
> loop that divides the remaining resources among hotplug-capable bridges
> does nothing.
> 
> Check for "hotplug_bridges == 0" earlier, so we don't even have to compute
> the amount of remaining resources.  No functional change intended.

Makes sense.

Reviewed-by: Logan Gunthorpe <logang@deltatee.com>

> ---
> 
> I'm pretty sure this patch preserves the previous behavior of
> pci_bus_distribute_available_resources(), but I'm not sure that
> behavior is what we want.
> 
> For example, in the following topology, when we process bus 10, we
> find two non-hotplug bridges and no hotplug bridges, so IIUC we return
> without distributing any resources to them.  But I would think we
> should try to give 10:1c.0 more space if possible because it has a
> hotplug bridge below it.
> 
>   00:1c.0: hotplug bridge to [bus 10-2f]
>     10:1c.0: non-hotplug bridge to [bus 11-2e]
>       11:00.0: hotplug bridge to [bus 12-2e]
>     10:1c.1: non-hotplug bridge to [bus 2f]
> ---
>  drivers/pci/setup-bus.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
> index af28af898e42..04adeebe8866 100644
> --- a/drivers/pci/setup-bus.c
> +++ b/drivers/pci/setup-bus.c
> @@ -1887,6 +1887,9 @@ static void pci_bus_distribute_available_resources(struct pci_bus *bus,
>  		return;
>  	}
>  
> +	if (hotplug_bridges == 0)
> +		return;
> +
>  	/*
>  	 * Calculate the total amount of extra resource space we can
>  	 * pass to bridges below this one.  This is basically the
> @@ -1936,8 +1939,6 @@ static void pci_bus_distribute_available_resources(struct pci_bus *bus,
>  		 * Distribute available extra resources equally between
>  		 * hotplug-capable downstream ports taking alignment into
>  		 * account.
> -		 *
> -		 * Here hotplug_bridges is always != 0.
>  		 */
>  		align = pci_resource_alignment(bridge, io_res);
>  		io = div64_ul(available_io, hotplug_bridges);
>
Benjamin Herrenschmidt June 24, 2019, 11:45 p.m. UTC | #3
On Mon, 2019-06-24 at 14:24 +0300, Mika Westerberg wrote:
> > 
> > I'm pretty sure this patch preserves the previous behavior of
> > pci_bus_distribute_available_resources(), but I'm not sure that
> > behavior is what we want.
> > 
> > For example, in the following topology, when we process bus 10, we
> > find two non-hotplug bridges and no hotplug bridges, so IIUC we
> > return
> > without distributing any resources to them.  But I would think we
> > should try to give 10:1c.0 more space if possible because it has a
> > hotplug bridge below it.
> > 
> >    00:1c.0: hotplug bridge to [bus 10-2f]
> >      10:1c.0: non-hotplug bridge to [bus 11-2e]
> >        11:00.0: hotplug bridge to [bus 12-2e]
> >      10:1c.1: non-hotplug bridge to [bus 2f]
> 
> Yes, I agree in this case we want to preserve more space for 10:1c.0.

I sitll can't make sense of any of this stuff though.

We only every distribute resources when using
pci_assign_unassigned_bridge_resources which we only use some cases,
and it's completely non obvious why we would use it there and not in
other places.

We also don't distribute during the initial root survey meaning afaik
that we get toast for any hotplug bridge that has stuff already there
at boot.

Also, distributing the "available" space means we leave nothing for
potential SR-IOV siblings... have we ended up bloting the very PCIe-
centric assumption that it's "unlikely" that a hotplug bridge has an
SR-IOV sibling ?

This is all a terrible mess and I feel that all these little tweaks
here or there are just making it even more impossible to completely
grasp or predict how it will behave.

Ben.
Mika Westerberg June 25, 2019, 10:05 a.m. UTC | #4
On Tue, Jun 25, 2019 at 09:45:04AM +1000, Benjamin Herrenschmidt wrote:
> On Mon, 2019-06-24 at 14:24 +0300, Mika Westerberg wrote:
> > > 
> > > I'm pretty sure this patch preserves the previous behavior of
> > > pci_bus_distribute_available_resources(), but I'm not sure that
> > > behavior is what we want.
> > > 
> > > For example, in the following topology, when we process bus 10, we
> > > find two non-hotplug bridges and no hotplug bridges, so IIUC we
> > > return
> > > without distributing any resources to them.  But I would think we
> > > should try to give 10:1c.0 more space if possible because it has a
> > > hotplug bridge below it.
> > > 
> > >    00:1c.0: hotplug bridge to [bus 10-2f]
> > >      10:1c.0: non-hotplug bridge to [bus 11-2e]
> > >        11:00.0: hotplug bridge to [bus 12-2e]
> > >      10:1c.1: non-hotplug bridge to [bus 2f]
> > 
> > Yes, I agree in this case we want to preserve more space for 10:1c.0.
> 
> I sitll can't make sense of any of this stuff though.
> 
> We only every distribute resources when using
> pci_assign_unassigned_bridge_resources which we only use some cases,
> and it's completely non obvious why we would use it there and not in
> other places.

We added it only for native PCIe hotplug path with the assumption that
the boot firmware takes care of the initial resource allocation. I don't
see any particular reason why it could not be called for other paths as
well, though.

> We also don't distribute during the initial root survey meaning afaik
> that we get toast for any hotplug bridge that has stuff already there
> at boot.

The boot firmware obviously needs to follow the same logic. AFAICT
recent PCs and Macs using native PCIe hotplug handle it.

> Also, distributing the "available" space means we leave nothing for
> potential SR-IOV siblings... have we ended up bloting the very PCIe-
> centric assumption that it's "unlikely" that a hotplug bridge has an
> SR-IOV sibling ?

Looking at the code, I'm not sure we reserved any additional resource
space for the SR-IOV even before pci_bus_distribute_available_resources()
was introduced. We do reserve extra bus numbers for SR-IOV in
pci_scan_child_bus_extend() so maybe we can add something similar to
resource allocation path.
Benjamin Herrenschmidt June 25, 2019, 11:48 a.m. UTC | #5
On Tue, 2019-06-25 at 13:05 +0300, Mika Westerberg wrote:
> > We only every distribute resources when using
> > pci_assign_unassigned_bridge_resources which we only use some cases,
> > and it's completely non obvious why we would use it there and not in
> > other places.
> 
> We added it only for native PCIe hotplug path with the assumption that
> the boot firmware takes care of the initial resource allocation. I don't
> see any particular reason why it could not be called for other paths as
> well, though.

Ok, we need to look into this for all the platforms who just reassign
everything in Linux (ie, ignore whatever the boot firmware did, if it
did anything).

I feel like all these platforms today will have a hard time getting
anything useful out of hotplug with our default "2M" add to the hotplug
bridges :)

> > We also don't distribute during the initial root survey meaning afaik
> > that we get toast for any hotplug bridge that has stuff already there
> > at boot.
> 
> The boot firmware obviously needs to follow the same logic. AFAICT
> recent PCs and Macs using native PCIe hotplug handle it.

What's your experience in that area ? How (well) do they handle it in
the boot firmware ? at least on arm64, boot firmwares are rather
catastrophic when it comes to PCI, and on other embedded devices they
are basically non-existent.

> > Also, distributing the "available" space means we leave nothing for
> > potential SR-IOV siblings... have we ended up bloting the very PCIe-
> > centric assumption that it's "unlikely" that a hotplug bridge has an
> > SR-IOV sibling ?
> 
> Looking at the code, I'm not sure we reserved any additional resource
> space for the SR-IOV even before pci_bus_distribute_available_resources()
> was introduced. We do reserve extra bus numbers for SR-IOV in
> pci_scan_child_bus_extend() so maybe we can add something similar to
> resource allocation path.

Ok. I'll look more. I think we do somewhat cater for SR-IOV in in the
bridge sizing code actually. It's a bit obscure...

I also need to look a bit more closely at what happens with
Thunderbolt.

Thanks !

Cheers
Ben.
Mika Westerberg June 25, 2019, 12:04 p.m. UTC | #6
On Tue, Jun 25, 2019 at 09:48:19PM +1000, Benjamin Herrenschmidt wrote:
> On Tue, 2019-06-25 at 13:05 +0300, Mika Westerberg wrote:
> > > We only every distribute resources when using
> > > pci_assign_unassigned_bridge_resources which we only use some cases,
> > > and it's completely non obvious why we would use it there and not in
> > > other places.
> > 
> > We added it only for native PCIe hotplug path with the assumption that
> > the boot firmware takes care of the initial resource allocation. I don't
> > see any particular reason why it could not be called for other paths as
> > well, though.
> 
> Ok, we need to look into this for all the platforms who just reassign
> everything in Linux (ie, ignore whatever the boot firmware did, if it
> did anything).
> 
> I feel like all these platforms today will have a hard time getting
> anything useful out of hotplug with our default "2M" add to the hotplug
> bridges :)

Yeah, at least if Thunderbolt is involved each "daisy-chained" device
adds a complete PCIe switch running out of resources rather quick.

> > > We also don't distribute during the initial root survey meaning afaik
> > > that we get toast for any hotplug bridge that has stuff already there
> > > at boot.
> > 
> > The boot firmware obviously needs to follow the same logic. AFAICT
> > recent PCs and Macs using native PCIe hotplug handle it.
> 
> What's your experience in that area ? How (well) do they handle it in
> the boot firmware ? at least on arm64, boot firmwares are rather
> catastrophic when it comes to PCI, and on other embedded devices they
> are basically non-existent.

Well my experience is quite limited to recent Macs and PCs which usually
handle the initial resource allocation just fine. In case of Thunderbolt
some "older" PCs handle everything in firmware, even the runtime
resource allocation via SMI handler accompanied with ACPI hotplug.
Benjamin Herrenschmidt June 25, 2019, 12:23 p.m. UTC | #7
On Tue, 2019-06-25 at 15:04 +0300, Mika Westerberg wrote:
> > What's your experience in that area ? How (well) do they handle it in
> > the boot firmware ? at least on arm64, boot firmwares are rather
> > catastrophic when it comes to PCI, and on other embedded devices they
> > are basically non-existent.
> 
> Well my experience is quite limited to recent Macs and PCs which usually
> handle the initial resource allocation just fine. In case of Thunderbolt
> some "older" PCs handle everything in firmware, even the runtime
> resource allocation via SMI handler accompanied with ACPI hotplug.

Ah so this is what Lenovo calls "Thunderbolt firmware assist" in the
BIOS ? I turned that on, it did help with Linux :)

Cheers,
Ben.
Mika Westerberg June 25, 2019, 12:43 p.m. UTC | #8
On Tue, Jun 25, 2019 at 10:23:33PM +1000, Benjamin Herrenschmidt wrote:
> On Tue, 2019-06-25 at 15:04 +0300, Mika Westerberg wrote:
> > > What's your experience in that area ? How (well) do they handle it in
> > > the boot firmware ? at least on arm64, boot firmwares are rather
> > > catastrophic when it comes to PCI, and on other embedded devices they
> > > are basically non-existent.
> > 
> > Well my experience is quite limited to recent Macs and PCs which usually
> > handle the initial resource allocation just fine. In case of Thunderbolt
> > some "older" PCs handle everything in firmware, even the runtime
> > resource allocation via SMI handler accompanied with ACPI hotplug.
> 
> Ah so this is what Lenovo calls "Thunderbolt firmware assist" in the
> BIOS ?

Yes, exactly.

> I turned that on, it did help with Linux :)

Well, it should also work using native PCIe with recent kernels. At
least that's what I've been trying to get working since last year ;-)
Benjamin Herrenschmidt June 25, 2019, 11:22 p.m. UTC | #9
On Tue, 2019-06-25 at 15:04 +0300, Mika Westerberg wrote:
> > What's your experience in that area ? How (well) do they handle it in
> > the boot firmware ? at least on arm64, boot firmwares are rather
> > catastrophic when it comes to PCI, and on other embedded devices they
> > are basically non-existent.
> 
> Well my experience is quite limited to recent Macs and PCs which usually
> handle the initial resource allocation just fine. In case of Thunderbolt
> some "older" PCs handle everything in firmware, even the runtime
> resource allocation via SMI handler accompanied with ACPI hotplug.

So I'm tempted to toy a bit with the "realloc everything" platforms
(all non-x86 embedded basically) use
pci_bus_distribute_available_resources on the PCIe root ports
unconditionally.

I don't think it will be a problem with SR-IOV as I very much doubt
you'll end up with a setup where we have under the root port SR-IOV
devices that are *siblings* with a switch. If that ever becomes the
case, SR-IOV should be hanlded somewhat as part of the add_list of the
bus sizing anyway.

I might do that as a test patch or behind a test config option, see how
it goes, after I've consolidated all those platforms to go through the
same generic code path, it will be a lot easier.

I'm keen to limit that to PCIe root ports though, old stuff can stay as
it is and die happily :)

Cheers,
Ben.
Bjorn Helgaas June 26, 2019, 5:35 p.m. UTC | #10
On Tue, Jun 25, 2019 at 09:45:04AM +1000, Benjamin Herrenschmidt wrote:
> On Mon, 2019-06-24 at 14:24 +0300, Mika Westerberg wrote:
> > > 
> > > I'm pretty sure this patch preserves the previous behavior of
> > > pci_bus_distribute_available_resources(), but I'm not sure that
> > > behavior is what we want.
> > > 
> > > For example, in the following topology, when we process bus 10, we
> > > find two non-hotplug bridges and no hotplug bridges, so IIUC we
> > > return
> > > without distributing any resources to them.  But I would think we
> > > should try to give 10:1c.0 more space if possible because it has a
> > > hotplug bridge below it.
> > > 
> > >    00:1c.0: hotplug bridge to [bus 10-2f]
> > >      10:1c.0: non-hotplug bridge to [bus 11-2e]
> > >        11:00.0: hotplug bridge to [bus 12-2e]
> > >      10:1c.1: non-hotplug bridge to [bus 2f]
> > 
> > Yes, I agree in this case we want to preserve more space for 10:1c.0.
> 
> I sitll can't make sense of any of this stuff though.
> 
> We only every distribute resources when using
> pci_assign_unassigned_bridge_resources which we only use some cases,
> and it's completely non obvious why we would use it there and not in
> other places.
> 
> We also don't distribute during the initial root survey meaning afaik
> that we get toast for any hotplug bridge that has stuff already there
> at boot.
> 
> Also, distributing the "available" space means we leave nothing for
> potential SR-IOV siblings... have we ended up bloting the very PCIe-
> centric assumption that it's "unlikely" that a hotplug bridge has an
> SR-IOV sibling ?
> 
> This is all a terrible mess and I feel that all these little tweaks
> here or there are just making it even more impossible to completely
> grasp or predict how it will behave.

No argument about it being a mess.

I agree that tweaks clutter the history, which is definitely a
downside.  Do you think these actually change the way things work or
make the code harder to read?

I think there is value in even minor simplifications that make the
code easier to understand.  Small improvements compound over time and
expose opportunities for more significant improvement.

Bjorn
Benjamin Herrenschmidt June 26, 2019, 10:35 p.m. UTC | #11
On Wed, 2019-06-26 at 12:35 -0500, Bjorn Helgaas wrote:
> No argument about it being a mess.
> 
> I agree that tweaks clutter the history, which is definitely a
> downside.  Do you think these actually change the way things work or
> make the code harder to read?
> 
> I think there is value in even minor simplifications that make the
> code easier to understand.  Small improvements compound over time and
> expose opportunities for more significant improvement.

Oh I absolutely agree. And I love that your patches come with more cset
comment than actual patch lines :-)

The main issue I've had so far trying to untangle things is the sheer
amount of subtle changes and tweaks that went in over the year without
any useful explanation as to why things are done.

For example, do you have any idea why this:

d65245c3297ac63abc51a976d92f45f2195d2854
PCI: don't shrink bridge resources

Was added by Yinghai in 2010 ? :-)

The main issue with this is that previous to this commit,
pbus_size_{io,mem} would essentially ignore the previous state of the
bridge resources, and calculate from scratch (provided the resources
are unclaimed).

After this, it has this subtle dependency.

This is what broke Lorenzo attempts at moving pci_read_bridge_bases()
to the geneneric code a couple of years ago for example. There may be a
good reason to do that on x86, though it's not explained, but it's
definitely not right if the platform requires a full re-assignment.

Now I'll "work around" it by making that function look at the
assignment policy set by the arch/platform, but it's a fix on top of a
quirk on top of a band-aid. However, what else can we do without
understanding the root issue that lead to that patch being created ?

Cheers,
Ben.
diff mbox series

Patch

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index af28af898e42..04adeebe8866 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -1887,6 +1887,9 @@  static void pci_bus_distribute_available_resources(struct pci_bus *bus,
 		return;
 	}
 
+	if (hotplug_bridges == 0)
+		return;
+
 	/*
 	 * Calculate the total amount of extra resource space we can
 	 * pass to bridges below this one.  This is basically the
@@ -1936,8 +1939,6 @@  static void pci_bus_distribute_available_resources(struct pci_bus *bus,
 		 * Distribute available extra resources equally between
 		 * hotplug-capable downstream ports taking alignment into
 		 * account.
-		 *
-		 * Here hotplug_bridges is always != 0.
 		 */
 		align = pci_resource_alignment(bridge, io_res);
 		io = div64_ul(available_io, hotplug_bridges);