diff mbox series

[v1,4/4] PCI: Allow extend_bridge_window() to shrink resource if necessary

Message ID PSXP216MB0438D3E2CFE64EBAA32AF691803C0@PSXP216MB0438.KORP216.PROD.OUTLOOK.COM
State New
Headers show
Series PCI: Allow Thunderbolt to work with resources from pci=hpmemsize | expand

Commit Message

Nicholas Johnson Jan. 6, 2020, 3:48 p.m. UTC
Remove checks for resource size in extend_bridge_window(). This is
necessary to allow the pci_bus_distribute_available_resources() to
function when the kernel parameter pci=hpmemsize=nn[KMG] is used to
allocate resources. Because the kernel parameter sets the size of all
hotplug bridges to be the same, there are problems when nested hotplug
bridges are encountered. Fitting a downstream hotplug bridge with size X
and normal bridges with non-zero size Y into parent hotplug bridge with
size X is impossible, and hence the downstream hotplug bridge needs to
shrink to fit into its parent.

Add check for if bridge is extended or shrunken and reflect that in the
call to pci_dbg().

Reset the resource if its new size is zero (if we have run out of a
bridge window resource) to prevent the PCI resource assignment code from
attempting to assign a zero-sized resource.

Signed-off-by: Nicholas Johnson <nicholas.johnson-opensource@outlook.com.au>
---
 drivers/pci/setup-bus.c | 17 ++++++++++++-----
 1 file changed, 12 insertions(+), 5 deletions(-)

Comments

Bjorn Helgaas Jan. 7, 2020, 8:34 p.m. UTC | #1
On Mon, Jan 06, 2020 at 03:48:06PM +0000, Nicholas Johnson wrote:
> Remove checks for resource size in extend_bridge_window(). This is
> necessary to allow the pci_bus_distribute_available_resources() to
> function when the kernel parameter pci=hpmemsize=nn[KMG] is used to
> allocate resources. Because the kernel parameter sets the size of all
> hotplug bridges to be the same, there are problems when nested hotplug
> bridges are encountered. Fitting a downstream hotplug bridge with size X
> and normal bridges with non-zero size Y into parent hotplug bridge with
> size X is impossible, and hence the downstream hotplug bridge needs to
> shrink to fit into its parent.

s/extend_bridge_window()/adjust_bridge_window()/ above
s/to allow the/to allow/

If this patch allows pci_bus_distribute_available_resources() to
function when pci=hpmemsize=nn is used, what happens *before* this
patch?  The text implies that pci_bus_distribute_available_resources()
doesn't function, but what happens?  Do we try to assign a downstream
bridge requiring X+n inside an upstream window of size X and the
assignment fails, leaving the downstream bridge unusable?

> Add check for if bridge is extended or shrunken and reflect that in the
> call to pci_dbg().
> 
> Reset the resource if its new size is zero (if we have run out of a
> bridge window resource) to prevent the PCI resource assignment code from
> attempting to assign a zero-sized resource.
> 
> Signed-off-by: Nicholas Johnson <nicholas.johnson-opensource@outlook.com.au>
> ---
>  drivers/pci/setup-bus.c | 17 ++++++++++++-----
>  1 file changed, 12 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
> index 0c51f4937..e7e57bf72 100644
> --- a/drivers/pci/setup-bus.c
> +++ b/drivers/pci/setup-bus.c
> @@ -1836,18 +1836,25 @@ static void adjust_bridge_window(struct pci_dev *bridge, struct resource *res,
>  				 struct list_head *add_list,
>  				 resource_size_t new_size)
>  {
> -	resource_size_t add_size;
> +	resource_size_t add_size, size = resource_size(res);
>  
>  	if (res->parent)
>  		return;
>  
> -	if (resource_size(res) >= new_size)
> -		return;
> +	if (new_size > size) {
> +		add_size = new_size - size;
> +		pci_dbg(bridge, "bridge window %pR extended by %pa\n", res,
> +			&add_size);
> +	} else if (new_size < size) {
> +		add_size = size - new_size;
> +		pci_dbg(bridge, "bridge window %pR shrunken by %pa\n", res,
> +			&add_size);
> +	}

Where's the patch that changes the caller so "new_size" may be smaller
than "size"?  I guess it must be "[3/3] PCI: Consider alignment of
hot-added bridges ..." because that's the only one that makes a
non-trivial change, right?

> -	add_size = new_size - resource_size(res);
> -	pci_dbg(bridge, "bridge window %pR extended by %pa\n", res, &add_size);
>  	res->end = res->start + new_size - 1;
>  	remove_from_list(add_list, res);
> +	if (!new_size)
> +		reset_resource(res);

I consider reset_resource() to be deprecated because it throws away
res->flags, which tells us what kind of resource it is
(mem/io/32-bit/64-bit/prefetchable).  We learn this during
enumeration, and we shouldn't forget the information until we remove
the device.

If the resource assignment code doesn't do the right thing with a
zero-sized resource, I think we should fix that code.  Clearing the
resource struct does nothing with the hardware BAR or window
registers, so the BAR/window remains enabled unless we do something
more.  If we don't need a window and we want to disable it, we can do
that, but it requires writing special values to the hardware
registers.

Bjorn
Nicholas Johnson Jan. 8, 2020, 1:36 a.m. UTC | #2
On Tue, Jan 07, 2020 at 02:34:35PM -0600, Bjorn Helgaas wrote:
> On Mon, Jan 06, 2020 at 03:48:06PM +0000, Nicholas Johnson wrote:
> > Remove checks for resource size in extend_bridge_window(). This is
> > necessary to allow the pci_bus_distribute_available_resources() to
> > function when the kernel parameter pci=hpmemsize=nn[KMG] is used to
> > allocate resources. Because the kernel parameter sets the size of all
> > hotplug bridges to be the same, there are problems when nested hotplug
> > bridges are encountered. Fitting a downstream hotplug bridge with size X
> > and normal bridges with non-zero size Y into parent hotplug bridge with
> > size X is impossible, and hence the downstream hotplug bridge needs to
> > shrink to fit into its parent.
> 
> s/extend_bridge_window()/adjust_bridge_window()/ above
> s/to allow the/to allow/

Okay
> 
> If this patch allows pci_bus_distribute_available_resources() to
> function when pci=hpmemsize=nn is used, what happens *before* this
> patch?  The text implies that pci_bus_distribute_available_resources()
> doesn't function, but what happens?  Do we try to assign a downstream
> bridge requiring X+n inside an upstream window of size X and the
> assignment fails, leaving the downstream bridge unusable?

I could add something similar to this to the log:

The hpmemsize is applied to add_size of every hotplug bridge, even 
nested ones. Say we set hpmemsize=256M, the upstream hotplug bridge gets 
256M. Then when we hot-add a Thunderbolt device with daisy chaining, the 
new nested bridge also gets 256M and this will not fit because some 
further space has been consumed by the endpoints in the Thunderbolt 
device. Hence, we cannot extend.

It works for Mika because he is interested in the cases when the 
firmware assigns the resources, hence hpmemsize=2M (default) and it does 
not cause problems, unless we run out of space and need to go below 2M.
> 
> > Add check for if bridge is extended or shrunken and reflect that in the
> > call to pci_dbg().
> > 
> > Reset the resource if its new size is zero (if we have run out of a
> > bridge window resource) to prevent the PCI resource assignment code from
> > attempting to assign a zero-sized resource.
> > 
> > Signed-off-by: Nicholas Johnson <nicholas.johnson-opensource@outlook.com.au>
> > ---
> >  drivers/pci/setup-bus.c | 17 ++++++++++++-----
> >  1 file changed, 12 insertions(+), 5 deletions(-)
> > 
> > diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
> > index 0c51f4937..e7e57bf72 100644
> > --- a/drivers/pci/setup-bus.c
> > +++ b/drivers/pci/setup-bus.c
> > @@ -1836,18 +1836,25 @@ static void adjust_bridge_window(struct pci_dev *bridge, struct resource *res,
> >  				 struct list_head *add_list,
> >  				 resource_size_t new_size)
> >  {
> > -	resource_size_t add_size;
> > +	resource_size_t add_size, size = resource_size(res);
> >  
> >  	if (res->parent)
> >  		return;
> >  
> > -	if (resource_size(res) >= new_size)
> > -		return;
> > +	if (new_size > size) {
> > +		add_size = new_size - size;
> > +		pci_dbg(bridge, "bridge window %pR extended by %pa\n", res,
> > +			&add_size);
> > +	} else if (new_size < size) {
> > +		add_size = size - new_size;
> > +		pci_dbg(bridge, "bridge window %pR shrunken by %pa\n", res,
> > +			&add_size);
> > +	}
> 
> Where's the patch that changes the caller so "new_size" may be smaller
> than "size"?  I guess it must be "[3/3] PCI: Consider alignment of
> hot-added bridges ..." because that's the only one that makes a
> non-trivial change, right?

As above, there was always a possibility of the new_size being smaller. 
For some reason, 1M is assigned to bridges, even if nothing is below 
them (for example, unused non hotplug bridges in a Thunderbolt dock). It 
may be an edge case if we are low on space, but theoretically it can 
happen.

Also, when writing this, Mika was not interested in using hpmemsize, 
which, when used, will cause new_size to be smaller than the current 
size (actual size and add_size combined).

So it does not need a patch to cause "new_size" to be smaller than 
"size" - just a change in user behaviour to use pci=hpmemsize.
> 
> > -	add_size = new_size - resource_size(res);
> > -	pci_dbg(bridge, "bridge window %pR extended by %pa\n", res, &add_size);
> >  	res->end = res->start + new_size - 1;
> >  	remove_from_list(add_list, res);
> > +	if (!new_size)
> > +		reset_resource(res);
> 
> I consider reset_resource() to be deprecated because it throws away
> res->flags, which tells us what kind of resource it is
> (mem/io/32-bit/64-bit/prefetchable).  We learn this during
> enumeration, and we shouldn't forget the information until we remove
> the device.

I will look at this, but I distinctly remember doing this because of IO 
BARs which would run out and cause the device not to be enabled.

Can you please comment on IORESOURCE_UNSET and what effect that would 
have if applied to the flags. Also, can you suggest any other solution 
other than making it handle zero-sized resources better? I agree, that 
would be ideal to make it handle zero-sized resources, but given the 
state of drivers/pci/setup-bus.c, I feel like it will just recursively 
open up more cans of worms until we metaphorically stack overflow. And I 
am happy to go down that path. But at some point I feel we will have to 
make some compromises / stop-gap measures to apply some patches and make 
progress, before going down that road. Would you agree?
> 
> If the resource assignment code doesn't do the right thing with a
> zero-sized resource, I think we should fix that code.  Clearing the
> resource struct does nothing with the hardware BAR or window
> registers, so the BAR/window remains enabled unless we do something
> more.  If we don't need a window and we want to disable it, we can do
> that, but it requires writing special values to the hardware
> registers.
> 
> Bjorn

https://lkml.org/lkml/2020/1/7/1544

You describe this as "black magic code", what appears to be the 
assignment code which handles lists of resources. And I agree. I believe 
it is in both our interests to avoid using add_size because nobody 
understands how these are handled. There may be bugs, and there is 
definitely lots of complexity involved. I believe simplicity is key. 
Hence why these changes in this series:

- Change resource size directly instead of using add_size

- Aside from the currently known cases of needing to shrink the 
resource, we cannot know that there will not be more cases of this in 
the future. There is no need for preventing it from shrinking - we have 
an available size for the bridge window, and if that happens to be 
smaller than the bridge window, we have no choice but to shrink. I 
believe this makes the check unnecessary and warrants removal.

Regards,
Nicholas
Mika Westerberg Jan. 13, 2020, 4:21 p.m. UTC | #3
On Wed, Jan 08, 2020 at 01:36:04AM +0000, Nicholas Johnson wrote:
> > Where's the patch that changes the caller so "new_size" may be smaller
> > than "size"?  I guess it must be "[3/3] PCI: Consider alignment of
> > hot-added bridges ..." because that's the only one that makes a
> > non-trivial change, right?
> 
> As above, there was always a possibility of the new_size being smaller. 
> For some reason, 1M is assigned to bridges, even if nothing is below 
> them (for example, unused non hotplug bridges in a Thunderbolt dock). It 
> may be an edge case if we are low on space, but theoretically it can 
> happen.
> 
> Also, when writing this, Mika was not interested in using hpmemsize, 
> which, when used, will cause new_size to be smaller than the current 
> size (actual size and add_size combined).

Just a small correction here about my motivation. So I'm testing on a
hardware where the BIOS assigns initial resources to the root/downstream
ports which is the majority of Thunderbolt capable PC systems nowadays.
Therefore the user does not need to pass any additional command line
parameters to get the ports working properly.

However, I'm of course interested in getting Linux PCI resource
management as good as possible regardless whether the firmware/BIOS
assigns them or not ;-)
Nicholas Johnson Jan. 15, 2020, 2:51 p.m. UTC | #4
On Mon, Jan 13, 2020 at 06:21:50PM +0200, Mika Westerberg wrote:
> On Wed, Jan 08, 2020 at 01:36:04AM +0000, Nicholas Johnson wrote:
> > > Where's the patch that changes the caller so "new_size" may be smaller
> > > than "size"?  I guess it must be "[3/3] PCI: Consider alignment of
> > > hot-added bridges ..." because that's the only one that makes a
> > > non-trivial change, right?
> > 
> > As above, there was always a possibility of the new_size being smaller. 
> > For some reason, 1M is assigned to bridges, even if nothing is below 
> > them (for example, unused non hotplug bridges in a Thunderbolt dock). It 
> > may be an edge case if we are low on space, but theoretically it can 
> > happen.
> > 
> > Also, when writing this, Mika was not interested in using hpmemsize, 
> > which, when used, will cause new_size to be smaller than the current 
> > size (actual size and add_size combined).
> 
> Just a small correction here about my motivation. So I'm testing on a
> hardware where the BIOS assigns initial resources to the root/downstream
> ports which is the majority of Thunderbolt capable PC systems nowadays.
> Therefore the user does not need to pass any additional command line
> parameters to get the ports working properly.
> 
> However, I'm of course interested in getting Linux PCI resource
> management as good as possible regardless whether the firmware/BIOS
> assigns them or not ;-)
Sorry, I was not meant to say you were not interested in getting it as 
good as possible. At the time, you had a goal to achieve (which you did) 
and at that point in time, it would not have been feasible to use 
pci=hpmemsize or similar before my patches were applied:

  c13704f5685d ("PCI: Avoid double hpmemsize MMIO window assignment")
  d7b8a217521c ("PCI: Add "pci=hpmmiosize" and "pci=hpmmioprefsize" parameters")

What I was trying to say was not that you were not interested, but more 
that it was not a primary motivation for you at the time. Does this 
sound more accurate? Poor wording on my behalf.
Mika Westerberg Jan. 15, 2020, 2:59 p.m. UTC | #5
On Wed, Jan 15, 2020 at 02:51:02PM +0000, Nicholas Johnson wrote:
> Sorry, I was not meant to say you were not interested in getting it as 
> good as possible. At the time, you had a goal to achieve (which you did) 
> and at that point in time, it would not have been feasible to use 
> pci=hpmemsize or similar before my patches were applied:
> 
>   c13704f5685d ("PCI: Avoid double hpmemsize MMIO window assignment")
>   d7b8a217521c ("PCI: Add "pci=hpmmiosize" and "pci=hpmmioprefsize" parameters")
> 
> What I was trying to say was not that you were not interested, but more 
> that it was not a primary motivation for you at the time. Does this 
> sound more accurate? Poor wording on my behalf.

Yes, it does and no worries :-) Just wanted to clarify that one.
diff mbox series

Patch

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 0c51f4937..e7e57bf72 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -1836,18 +1836,25 @@  static void adjust_bridge_window(struct pci_dev *bridge, struct resource *res,
 				 struct list_head *add_list,
 				 resource_size_t new_size)
 {
-	resource_size_t add_size;
+	resource_size_t add_size, size = resource_size(res);
 
 	if (res->parent)
 		return;
 
-	if (resource_size(res) >= new_size)
-		return;
+	if (new_size > size) {
+		add_size = new_size - size;
+		pci_dbg(bridge, "bridge window %pR extended by %pa\n", res,
+			&add_size);
+	} else if (new_size < size) {
+		add_size = size - new_size;
+		pci_dbg(bridge, "bridge window %pR shrunken by %pa\n", res,
+			&add_size);
+	}
 
-	add_size = new_size - resource_size(res);
-	pci_dbg(bridge, "bridge window %pR extended by %pa\n", res, &add_size);
 	res->end = res->start + new_size - 1;
 	remove_from_list(add_list, res);
+	if (!new_size)
+		reset_resource(res);
 }
 
 static void pci_bus_distribute_available_resources(struct pci_bus *bus,