diff mbox series

[v3,1/1] PCI: Fix bug resulting in double hpmemsize being assigned to MMIO window

Message ID PS2P216MB075563AA6AD242AA666EDC6A80760@PS2P216MB0755.KORP216.PROD.OUTLOOK.COM
State Accepted
Delegated to: Bjorn Helgaas
Headers show
Series Fix bug resulting in double hpmemsize being assigned to MMIO window | expand

Commit Message

Nicholas Johnson Nov. 13, 2019, 3:25 p.m. UTC
Currently, the kernel can sometimes assign the MMIO_PREF window
additional size into the MMIO window, resulting in extra MMIO additional
size, despite the MMIO_PREF additional size being assigned successfully
into the MMIO_PREF window.

This happens if in the first pass, the MMIO_PREF succeeds but the MMIO
fails. In the next pass, because MMIO_PREF is already assigned, the
attempt to assign MMIO_PREF returns an error code instead of success
(nothing more to do, already allocated). Hence, the size which is
actually allocated, but thought to have failed, is placed in the MMIO
window.

Example of problem (more context can be found in the bug report URL):

Mainline kernel:
pci 0000:06:01.0: BAR 14: assigned [mem 0x90100000-0xa00fffff] = 256M
pci 0000:06:04.0: BAR 14: assigned [mem 0xa0200000-0xb01fffff] = 256M

Patched kernel:
pci 0000:06:01.0: BAR 14: assigned [mem 0x90100000-0x980fffff] = 128M
pci 0000:06:04.0: BAR 14: assigned [mem 0x98200000-0xa01fffff] = 128M

This was using pci=realloc,hpmemsize=128M,nocrs - on the same machine
with the same configuration, with a Ubuntu mainline kernel and a kernel
patched with this patch.

The bug results in the MMIO_PREF being added to the MMIO window, which
means doubling if MMIO_PREF size = MMIO size. With a large MMIO_PREF,
the MMIO window will likely fail to be assigned altogether due to lack
of 32-bit address space.

Change find_free_bus_resource() to do the following:
- Return first unassigned resource of the correct type.
- If none of the above, return first assigned resource of the correct type.
- If none of the above, return NULL.

Returning an assigned resource of the correct type allows the caller to
distinguish between already assigned and no resource of the correct type.

Rename find_free_bus_resource to find_bus_resource_of_type().

Add checks in pbus_size_io() and pbus_size_mem() to return success if
resource returned from find_free_bus_resource() is already allocated.

This avoids pbus_size_io() and pbus_size_mem() returning error code to
__pci_bus_size_bridges() when a resource has been successfully assigned
in a previous pass. This fixes the existing behaviour where space for a
resource could be reserved multiple times in different parent bridge
windows.

Link: https://lore.kernel.org/lkml/20190531171216.20532-2-logang@deltatee.com/T/#u
Link: https://bugzilla.kernel.org/show_bug.cgi?id=203243

Reported-by: Kit Chow <kchow@gigaio.com>
Reported-by: Nicholas Johnson <nicholas.johnson-opensource@outlook.com.au>
Signed-off-by: Nicholas Johnson <nicholas.johnson-opensource@outlook.com.au>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
---
 drivers/pci/setup-bus.c | 38 +++++++++++++++++++++++++++-----------
 1 file changed, 27 insertions(+), 11 deletions(-)

Comments

Logan Gunthorpe Nov. 13, 2019, 4:50 p.m. UTC | #1
On 2019-11-13 8:25 a.m., Nicholas Johnson wrote:
> Currently, the kernel can sometimes assign the MMIO_PREF window
> additional size into the MMIO window, resulting in extra MMIO additional
> size, despite the MMIO_PREF additional size being assigned successfully
> into the MMIO_PREF window.
> 
> This happens if in the first pass, the MMIO_PREF succeeds but the MMIO
> fails. In the next pass, because MMIO_PREF is already assigned, the
> attempt to assign MMIO_PREF returns an error code instead of success
> (nothing more to do, already allocated). Hence, the size which is
> actually allocated, but thought to have failed, is placed in the MMIO
> window.
> 
> Example of problem (more context can be found in the bug report URL):
> 
> Mainline kernel:
> pci 0000:06:01.0: BAR 14: assigned [mem 0x90100000-0xa00fffff] = 256M
> pci 0000:06:04.0: BAR 14: assigned [mem 0xa0200000-0xb01fffff] = 256M
> 
> Patched kernel:
> pci 0000:06:01.0: BAR 14: assigned [mem 0x90100000-0x980fffff] = 128M
> pci 0000:06:04.0: BAR 14: assigned [mem 0x98200000-0xa01fffff] = 128M
> 
> This was using pci=realloc,hpmemsize=128M,nocrs - on the same machine
> with the same configuration, with a Ubuntu mainline kernel and a kernel
> patched with this patch.
> 
> The bug results in the MMIO_PREF being added to the MMIO window, which
> means doubling if MMIO_PREF size = MMIO size. With a large MMIO_PREF,
> the MMIO window will likely fail to be assigned altogether due to lack
> of 32-bit address space.
> 
> Change find_free_bus_resource() to do the following:
> - Return first unassigned resource of the correct type.
> - If none of the above, return first assigned resource of the correct type.
> - If none of the above, return NULL.
> 
> Returning an assigned resource of the correct type allows the caller to
> distinguish between already assigned and no resource of the correct type.
> 
> Rename find_free_bus_resource to find_bus_resource_of_type().
> 
> Add checks in pbus_size_io() and pbus_size_mem() to return success if
> resource returned from find_free_bus_resource() is already allocated.
> 
> This avoids pbus_size_io() and pbus_size_mem() returning error code to
> __pci_bus_size_bridges() when a resource has been successfully assigned
> in a previous pass. This fixes the existing behaviour where space for a
> resource could be reserved multiple times in different parent bridge
> windows.
> 
> Link: https://lore.kernel.org/lkml/20190531171216.20532-2-logang@deltatee.com/T/#u
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=203243
> 
> Reported-by: Kit Chow <kchow@gigaio.com>
> Reported-by: Nicholas Johnson <nicholas.johnson-opensource@outlook.com.au>
> Signed-off-by: Nicholas Johnson <nicholas.johnson-opensource@outlook.com.au>
> Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>

You can add mine as well:

Reviewed-by: Logan Gunthorpe <logang@deltatee.com>

Thanks,

Logan
Bjorn Helgaas Nov. 14, 2019, 4:56 p.m. UTC | #2
On Wed, Nov 13, 2019 at 03:25:28PM +0000, Nicholas Johnson wrote:
> Currently, the kernel can sometimes assign the MMIO_PREF window
> additional size into the MMIO window, resulting in extra MMIO additional
> size, despite the MMIO_PREF additional size being assigned successfully
> into the MMIO_PREF window.
> 
> This happens if in the first pass, the MMIO_PREF succeeds but the MMIO
> fails. In the next pass, because MMIO_PREF is already assigned, the
> attempt to assign MMIO_PREF returns an error code instead of success
> (nothing more to do, already allocated). Hence, the size which is
> actually allocated, but thought to have failed, is placed in the MMIO
> window.
> 
> Example of problem (more context can be found in the bug report URL):
> 
> Mainline kernel:
> pci 0000:06:01.0: BAR 14: assigned [mem 0x90100000-0xa00fffff] = 256M
> pci 0000:06:04.0: BAR 14: assigned [mem 0xa0200000-0xb01fffff] = 256M
> 
> Patched kernel:
> pci 0000:06:01.0: BAR 14: assigned [mem 0x90100000-0x980fffff] = 128M
> pci 0000:06:04.0: BAR 14: assigned [mem 0x98200000-0xa01fffff] = 128M
> 
> This was using pci=realloc,hpmemsize=128M,nocrs - on the same machine
> with the same configuration, with a Ubuntu mainline kernel and a kernel
> patched with this patch.
> 
> The bug results in the MMIO_PREF being added to the MMIO window, which
> means doubling if MMIO_PREF size = MMIO size. With a large MMIO_PREF,
> the MMIO window will likely fail to be assigned altogether due to lack
> of 32-bit address space.
> 
> Change find_free_bus_resource() to do the following:
> - Return first unassigned resource of the correct type.
> - If none of the above, return first assigned resource of the correct type.
> - If none of the above, return NULL.
> 
> Returning an assigned resource of the correct type allows the caller to
> distinguish between already assigned and no resource of the correct type.
> 
> Rename find_free_bus_resource to find_bus_resource_of_type().
> 
> Add checks in pbus_size_io() and pbus_size_mem() to return success if
> resource returned from find_free_bus_resource() is already allocated.
> 
> This avoids pbus_size_io() and pbus_size_mem() returning error code to
> __pci_bus_size_bridges() when a resource has been successfully assigned
> in a previous pass. This fixes the existing behaviour where space for a
> resource could be reserved multiple times in different parent bridge
> windows.
> 
> Link: https://lore.kernel.org/lkml/20190531171216.20532-2-logang@deltatee.com/T/#u
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=203243
> 
> Reported-by: Kit Chow <kchow@gigaio.com>
> Reported-by: Nicholas Johnson <nicholas.johnson-opensource@outlook.com.au>
> Signed-off-by: Nicholas Johnson <nicholas.johnson-opensource@outlook.com.au>
> Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>

Applied with reviewed-by from Mika and Logan to pci/resource for v5.5,
thanks!

> ---
>  drivers/pci/setup-bus.c | 38 +++++++++++++++++++++++++++-----------
>  1 file changed, 27 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
> index 6b64bf909..f382f449b 100644
> --- a/drivers/pci/setup-bus.c
> +++ b/drivers/pci/setup-bus.c
> @@ -752,24 +752,32 @@ static void pci_bridge_check_ranges(struct pci_bus *bus)
>  }
>  
>  /*
> - * Helper function for sizing routines: find first available bus resource
> - * of a given type.  Note: we intentionally skip the bus resources which
> - * have already been assigned (that is, have non-NULL parent resource).
> + * Helper function for sizing routines.
> + * Assigned resources have non-NULL parent resource.
> + *
> + * Return first unassigned resource of the correct type.
> + * If none of the above, return first assigned resource of the correct type.
> + * If none of the above, return NULL.
> + *
> + * Returning an assigned resource of the correct type allows the caller to
> + * distinguish between already assigned and no resource of the correct type.
>   */
> -static struct resource *find_free_bus_resource(struct pci_bus *bus,
> -					       unsigned long type_mask,
> -					       unsigned long type)
> +static struct resource *find_bus_resource_of_type(struct pci_bus *bus,
> +						  unsigned long type_mask,
> +						  unsigned long type)
>  {
> +	struct resource *r, *r_assigned = NULL;
>  	int i;
> -	struct resource *r;
>  
>  	pci_bus_for_each_resource(bus, r, i) {
>  		if (r == &ioport_resource || r == &iomem_resource)
>  			continue;
>  		if (r && (r->flags & type_mask) == type && !r->parent)
>  			return r;
> +		if (r && (r->flags & type_mask) == type && !r_assigned)
> +			r_assigned = r;
>  	}
> -	return NULL;
> +	return r_assigned;
>  }
>  
>  static resource_size_t calculate_iosize(resource_size_t size,
> @@ -866,8 +874,8 @@ static void pbus_size_io(struct pci_bus *bus, resource_size_t min_size,
>  			 struct list_head *realloc_head)
>  {
>  	struct pci_dev *dev;
> -	struct resource *b_res = find_free_bus_resource(bus, IORESOURCE_IO,
> -							IORESOURCE_IO);
> +	struct resource *b_res = find_bus_resource_of_type(bus, IORESOURCE_IO,
> +								IORESOURCE_IO);
>  	resource_size_t size = 0, size0 = 0, size1 = 0;
>  	resource_size_t children_add_size = 0;
>  	resource_size_t min_align, align;
> @@ -875,6 +883,10 @@ static void pbus_size_io(struct pci_bus *bus, resource_size_t min_size,
>  	if (!b_res)
>  		return;
>  
> +	/* If resource is already assigned, nothing more to do. */
> +	if (b_res->parent)
> +		return;
> +
>  	min_align = window_alignment(bus, IORESOURCE_IO);
>  	list_for_each_entry(dev, &bus->devices, bus_list) {
>  		int i;
> @@ -978,7 +990,7 @@ static int pbus_size_mem(struct pci_bus *bus, unsigned long mask,
>  	resource_size_t min_align, align, size, size0, size1;
>  	resource_size_t aligns[18]; /* Alignments from 1MB to 128GB */
>  	int order, max_order;
> -	struct resource *b_res = find_free_bus_resource(bus,
> +	struct resource *b_res = find_bus_resource_of_type(bus,
>  					mask | IORESOURCE_PREFETCH, type);
>  	resource_size_t children_add_size = 0;
>  	resource_size_t children_add_align = 0;
> @@ -987,6 +999,10 @@ static int pbus_size_mem(struct pci_bus *bus, unsigned long mask,
>  	if (!b_res)
>  		return -ENOSPC;
>  
> +	/* If resource is already assigned, nothing more to do. */
> +	if (b_res->parent)
> +		return 0;
> +
>  	memset(aligns, 0, sizeof(aligns));
>  	max_order = 0;
>  	size = 0;
> -- 
> 2.24.0
>
Nicholas Johnson Nov. 16, 2019, 2:54 a.m. UTC | #3
On Thu, Nov 14, 2019 at 10:56:37AM -0600, Bjorn Helgaas wrote:
> On Wed, Nov 13, 2019 at 03:25:28PM +0000, Nicholas Johnson wrote:
> > Currently, the kernel can sometimes assign the MMIO_PREF window
> > additional size into the MMIO window, resulting in extra MMIO additional
> > size, despite the MMIO_PREF additional size being assigned successfully
> > into the MMIO_PREF window.
> > 
> > This happens if in the first pass, the MMIO_PREF succeeds but the MMIO
> > fails. In the next pass, because MMIO_PREF is already assigned, the
> > attempt to assign MMIO_PREF returns an error code instead of success
> > (nothing more to do, already allocated). Hence, the size which is
> > actually allocated, but thought to have failed, is placed in the MMIO
> > window.
> > 
> > Example of problem (more context can be found in the bug report URL):
> > 
> > Mainline kernel:
> > pci 0000:06:01.0: BAR 14: assigned [mem 0x90100000-0xa00fffff] = 256M
> > pci 0000:06:04.0: BAR 14: assigned [mem 0xa0200000-0xb01fffff] = 256M
> > 
> > Patched kernel:
> > pci 0000:06:01.0: BAR 14: assigned [mem 0x90100000-0x980fffff] = 128M
> > pci 0000:06:04.0: BAR 14: assigned [mem 0x98200000-0xa01fffff] = 128M
> > 
> > This was using pci=realloc,hpmemsize=128M,nocrs - on the same machine
> > with the same configuration, with a Ubuntu mainline kernel and a kernel
> > patched with this patch.
> > 
> > The bug results in the MMIO_PREF being added to the MMIO window, which
> > means doubling if MMIO_PREF size = MMIO size. With a large MMIO_PREF,
> > the MMIO window will likely fail to be assigned altogether due to lack
> > of 32-bit address space.
> > 
> > Change find_free_bus_resource() to do the following:
> > - Return first unassigned resource of the correct type.
> > - If none of the above, return first assigned resource of the correct type.
> > - If none of the above, return NULL.
> > 
> > Returning an assigned resource of the correct type allows the caller to
> > distinguish between already assigned and no resource of the correct type.
> > 
> > Rename find_free_bus_resource to find_bus_resource_of_type().
> > 
> > Add checks in pbus_size_io() and pbus_size_mem() to return success if
> > resource returned from find_free_bus_resource() is already allocated.
> > 
> > This avoids pbus_size_io() and pbus_size_mem() returning error code to
> > __pci_bus_size_bridges() when a resource has been successfully assigned
> > in a previous pass. This fixes the existing behaviour where space for a
> > resource could be reserved multiple times in different parent bridge
> > windows.
> > 
> > Link: https://lore.kernel.org/lkml/20190531171216.20532-2-logang@deltatee.com/T/#u
> > Link: https://bugzilla.kernel.org/show_bug.cgi?id=203243
> > 
> > Reported-by: Kit Chow <kchow@gigaio.com>
> > Reported-by: Nicholas Johnson <nicholas.johnson-opensource@outlook.com.au>
> > Signed-off-by: Nicholas Johnson <nicholas.johnson-opensource@outlook.com.au>
> > Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
> 
> Applied with reviewed-by from Mika and Logan to pci/resource for v5.5,
> thanks!

Awesome, thanks. Now where do we stand on my last four patches?
I hope there will be Linux 5.4-rc8, which will give us another week.

> 
> > ---
> >  drivers/pci/setup-bus.c | 38 +++++++++++++++++++++++++++-----------
> >  1 file changed, 27 insertions(+), 11 deletions(-)
> > 
> > diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
> > index 6b64bf909..f382f449b 100644
> > --- a/drivers/pci/setup-bus.c
> > +++ b/drivers/pci/setup-bus.c
> > @@ -752,24 +752,32 @@ static void pci_bridge_check_ranges(struct pci_bus *bus)
> >  }
> >  
> >  /*
> > - * Helper function for sizing routines: find first available bus resource
> > - * of a given type.  Note: we intentionally skip the bus resources which
> > - * have already been assigned (that is, have non-NULL parent resource).
> > + * Helper function for sizing routines.
> > + * Assigned resources have non-NULL parent resource.
> > + *
> > + * Return first unassigned resource of the correct type.
> > + * If none of the above, return first assigned resource of the correct type.
> > + * If none of the above, return NULL.
> > + *
> > + * Returning an assigned resource of the correct type allows the caller to
> > + * distinguish between already assigned and no resource of the correct type.
> >   */
> > -static struct resource *find_free_bus_resource(struct pci_bus *bus,
> > -					       unsigned long type_mask,
> > -					       unsigned long type)
> > +static struct resource *find_bus_resource_of_type(struct pci_bus *bus,
> > +						  unsigned long type_mask,
> > +						  unsigned long type)
> >  {
> > +	struct resource *r, *r_assigned = NULL;
> >  	int i;
> > -	struct resource *r;
> >  
> >  	pci_bus_for_each_resource(bus, r, i) {
> >  		if (r == &ioport_resource || r == &iomem_resource)
> >  			continue;
> >  		if (r && (r->flags & type_mask) == type && !r->parent)
> >  			return r;
> > +		if (r && (r->flags & type_mask) == type && !r_assigned)
> > +			r_assigned = r;
> >  	}
> > -	return NULL;
> > +	return r_assigned;
> >  }
> >  
> >  static resource_size_t calculate_iosize(resource_size_t size,
> > @@ -866,8 +874,8 @@ static void pbus_size_io(struct pci_bus *bus, resource_size_t min_size,
> >  			 struct list_head *realloc_head)
> >  {
> >  	struct pci_dev *dev;
> > -	struct resource *b_res = find_free_bus_resource(bus, IORESOURCE_IO,
> > -							IORESOURCE_IO);
> > +	struct resource *b_res = find_bus_resource_of_type(bus, IORESOURCE_IO,
> > +								IORESOURCE_IO);
> >  	resource_size_t size = 0, size0 = 0, size1 = 0;
> >  	resource_size_t children_add_size = 0;
> >  	resource_size_t min_align, align;
> > @@ -875,6 +883,10 @@ static void pbus_size_io(struct pci_bus *bus, resource_size_t min_size,
> >  	if (!b_res)
> >  		return;
> >  
> > +	/* If resource is already assigned, nothing more to do. */
> > +	if (b_res->parent)
> > +		return;
> > +
> >  	min_align = window_alignment(bus, IORESOURCE_IO);
> >  	list_for_each_entry(dev, &bus->devices, bus_list) {
> >  		int i;
> > @@ -978,7 +990,7 @@ static int pbus_size_mem(struct pci_bus *bus, unsigned long mask,
> >  	resource_size_t min_align, align, size, size0, size1;
> >  	resource_size_t aligns[18]; /* Alignments from 1MB to 128GB */
> >  	int order, max_order;
> > -	struct resource *b_res = find_free_bus_resource(bus,
> > +	struct resource *b_res = find_bus_resource_of_type(bus,
> >  					mask | IORESOURCE_PREFETCH, type);
> >  	resource_size_t children_add_size = 0;
> >  	resource_size_t children_add_align = 0;
> > @@ -987,6 +999,10 @@ static int pbus_size_mem(struct pci_bus *bus, unsigned long mask,
> >  	if (!b_res)
> >  		return -ENOSPC;
> >  
> > +	/* If resource is already assigned, nothing more to do. */
> > +	if (b_res->parent)
> > +		return 0;
> > +
> >  	memset(aligns, 0, sizeof(aligns));
> >  	max_order = 0;
> >  	size = 0;
> > -- 
> > 2.24.0
> >
Nicholas Johnson Nov. 18, 2019, 9:43 a.m. UTC | #4
On Thu, Nov 14, 2019 at 10:56:37AM -0600, Bjorn Helgaas wrote:
> On Wed, Nov 13, 2019 at 03:25:28PM +0000, Nicholas Johnson wrote:
> > Currently, the kernel can sometimes assign the MMIO_PREF window
> > additional size into the MMIO window, resulting in extra MMIO additional
> > size, despite the MMIO_PREF additional size being assigned successfully
> > into the MMIO_PREF window.
> > 
> > This happens if in the first pass, the MMIO_PREF succeeds but the MMIO
> > fails. In the next pass, because MMIO_PREF is already assigned, the
> > attempt to assign MMIO_PREF returns an error code instead of success
> > (nothing more to do, already allocated). Hence, the size which is
> > actually allocated, but thought to have failed, is placed in the MMIO
> > window.
> > 
> > Example of problem (more context can be found in the bug report URL):
> > 
> > Mainline kernel:
> > pci 0000:06:01.0: BAR 14: assigned [mem 0x90100000-0xa00fffff] = 256M
> > pci 0000:06:04.0: BAR 14: assigned [mem 0xa0200000-0xb01fffff] = 256M
> > 
> > Patched kernel:
> > pci 0000:06:01.0: BAR 14: assigned [mem 0x90100000-0x980fffff] = 128M
> > pci 0000:06:04.0: BAR 14: assigned [mem 0x98200000-0xa01fffff] = 128M
> > 
> > This was using pci=realloc,hpmemsize=128M,nocrs - on the same machine
> > with the same configuration, with a Ubuntu mainline kernel and a kernel
> > patched with this patch.
> > 
> > The bug results in the MMIO_PREF being added to the MMIO window, which
> > means doubling if MMIO_PREF size = MMIO size. With a large MMIO_PREF,
> > the MMIO window will likely fail to be assigned altogether due to lack
> > of 32-bit address space.
> > 
> > Change find_free_bus_resource() to do the following:
> > - Return first unassigned resource of the correct type.
> > - If none of the above, return first assigned resource of the correct type.
> > - If none of the above, return NULL.
> > 
> > Returning an assigned resource of the correct type allows the caller to
> > distinguish between already assigned and no resource of the correct type.
> > 
> > Rename find_free_bus_resource to find_bus_resource_of_type().
> > 
> > Add checks in pbus_size_io() and pbus_size_mem() to return success if
> > resource returned from find_free_bus_resource() is already allocated.
> > 
> > This avoids pbus_size_io() and pbus_size_mem() returning error code to
> > __pci_bus_size_bridges() when a resource has been successfully assigned
> > in a previous pass. This fixes the existing behaviour where space for a
> > resource could be reserved multiple times in different parent bridge
> > windows.
> > 
> > Link: https://lore.kernel.org/lkml/20190531171216.20532-2-logang@deltatee.com/T/#u
> > Link: https://bugzilla.kernel.org/show_bug.cgi?id=203243
> > 
> > Reported-by: Kit Chow <kchow@gigaio.com>
> > Reported-by: Nicholas Johnson <nicholas.johnson-opensource@outlook.com.au>
> > Signed-off-by: Nicholas Johnson <nicholas.johnson-opensource@outlook.com.au>
> > Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
> 
> Applied with reviewed-by from Mika and Logan to pci/resource for v5.5,
> thanks!

We have v5.4-rc8, so there is one more week. Please let me know if you 
have any concerns about the other four patches so that I may address 
them ASAP. If you are worried about the first one, I can re-post the 
series with it at the end, so that the others can be taken first.

Thanks!

Regards,
Nicholas

> 
> > ---
> >  drivers/pci/setup-bus.c | 38 +++++++++++++++++++++++++++-----------
> >  1 file changed, 27 insertions(+), 11 deletions(-)
> > 
> > diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
> > index 6b64bf909..f382f449b 100644
> > --- a/drivers/pci/setup-bus.c
> > +++ b/drivers/pci/setup-bus.c
> > @@ -752,24 +752,32 @@ static void pci_bridge_check_ranges(struct pci_bus *bus)
> >  }
> >  
> >  /*
> > - * Helper function for sizing routines: find first available bus resource
> > - * of a given type.  Note: we intentionally skip the bus resources which
> > - * have already been assigned (that is, have non-NULL parent resource).
> > + * Helper function for sizing routines.
> > + * Assigned resources have non-NULL parent resource.
> > + *
> > + * Return first unassigned resource of the correct type.
> > + * If none of the above, return first assigned resource of the correct type.
> > + * If none of the above, return NULL.
> > + *
> > + * Returning an assigned resource of the correct type allows the caller to
> > + * distinguish between already assigned and no resource of the correct type.
> >   */
> > -static struct resource *find_free_bus_resource(struct pci_bus *bus,
> > -					       unsigned long type_mask,
> > -					       unsigned long type)
> > +static struct resource *find_bus_resource_of_type(struct pci_bus *bus,
> > +						  unsigned long type_mask,
> > +						  unsigned long type)
> >  {
> > +	struct resource *r, *r_assigned = NULL;
> >  	int i;
> > -	struct resource *r;
> >  
> >  	pci_bus_for_each_resource(bus, r, i) {
> >  		if (r == &ioport_resource || r == &iomem_resource)
> >  			continue;
> >  		if (r && (r->flags & type_mask) == type && !r->parent)
> >  			return r;
> > +		if (r && (r->flags & type_mask) == type && !r_assigned)
> > +			r_assigned = r;
> >  	}
> > -	return NULL;
> > +	return r_assigned;
> >  }
> >  
> >  static resource_size_t calculate_iosize(resource_size_t size,
> > @@ -866,8 +874,8 @@ static void pbus_size_io(struct pci_bus *bus, resource_size_t min_size,
> >  			 struct list_head *realloc_head)
> >  {
> >  	struct pci_dev *dev;
> > -	struct resource *b_res = find_free_bus_resource(bus, IORESOURCE_IO,
> > -							IORESOURCE_IO);
> > +	struct resource *b_res = find_bus_resource_of_type(bus, IORESOURCE_IO,
> > +								IORESOURCE_IO);
> >  	resource_size_t size = 0, size0 = 0, size1 = 0;
> >  	resource_size_t children_add_size = 0;
> >  	resource_size_t min_align, align;
> > @@ -875,6 +883,10 @@ static void pbus_size_io(struct pci_bus *bus, resource_size_t min_size,
> >  	if (!b_res)
> >  		return;
> >  
> > +	/* If resource is already assigned, nothing more to do. */
> > +	if (b_res->parent)
> > +		return;
> > +
> >  	min_align = window_alignment(bus, IORESOURCE_IO);
> >  	list_for_each_entry(dev, &bus->devices, bus_list) {
> >  		int i;
> > @@ -978,7 +990,7 @@ static int pbus_size_mem(struct pci_bus *bus, unsigned long mask,
> >  	resource_size_t min_align, align, size, size0, size1;
> >  	resource_size_t aligns[18]; /* Alignments from 1MB to 128GB */
> >  	int order, max_order;
> > -	struct resource *b_res = find_free_bus_resource(bus,
> > +	struct resource *b_res = find_bus_resource_of_type(bus,
> >  					mask | IORESOURCE_PREFETCH, type);
> >  	resource_size_t children_add_size = 0;
> >  	resource_size_t children_add_align = 0;
> > @@ -987,6 +999,10 @@ static int pbus_size_mem(struct pci_bus *bus, unsigned long mask,
> >  	if (!b_res)
> >  		return -ENOSPC;
> >  
> > +	/* If resource is already assigned, nothing more to do. */
> > +	if (b_res->parent)
> > +		return 0;
> > +
> >  	memset(aligns, 0, sizeof(aligns));
> >  	max_order = 0;
> >  	size = 0;
> > -- 
> > 2.24.0
> >
Bjorn Helgaas Nov. 18, 2019, 10:58 p.m. UTC | #5
On Mon, Nov 18, 2019 at 09:43:34AM +0000, Nicholas Johnson wrote:
> On Thu, Nov 14, 2019 at 10:56:37AM -0600, Bjorn Helgaas wrote:
> > On Wed, Nov 13, 2019 at 03:25:28PM +0000, Nicholas Johnson wrote:
> > > Currently, the kernel can sometimes assign the MMIO_PREF window
> > > additional size into the MMIO window, resulting in extra MMIO additional
> > > size, despite the MMIO_PREF additional size being assigned successfully
> > > into the MMIO_PREF window.
> > > 
> > > This happens if in the first pass, the MMIO_PREF succeeds but the MMIO
> > > fails. In the next pass, because MMIO_PREF is already assigned, the
> > > attempt to assign MMIO_PREF returns an error code instead of success
> > > (nothing more to do, already allocated). Hence, the size which is
> > > actually allocated, but thought to have failed, is placed in the MMIO
> > > window.
> > > 
> > > Example of problem (more context can be found in the bug report URL):
> > > 
> > > Mainline kernel:
> > > pci 0000:06:01.0: BAR 14: assigned [mem 0x90100000-0xa00fffff] = 256M
> > > pci 0000:06:04.0: BAR 14: assigned [mem 0xa0200000-0xb01fffff] = 256M
> > > 
> > > Patched kernel:
> > > pci 0000:06:01.0: BAR 14: assigned [mem 0x90100000-0x980fffff] = 128M
> > > pci 0000:06:04.0: BAR 14: assigned [mem 0x98200000-0xa01fffff] = 128M
> > > 
> > > This was using pci=realloc,hpmemsize=128M,nocrs - on the same machine
> > > with the same configuration, with a Ubuntu mainline kernel and a kernel
> > > patched with this patch.
> > > 
> > > The bug results in the MMIO_PREF being added to the MMIO window, which
> > > means doubling if MMIO_PREF size = MMIO size. With a large MMIO_PREF,
> > > the MMIO window will likely fail to be assigned altogether due to lack
> > > of 32-bit address space.
> > > 
> > > Change find_free_bus_resource() to do the following:
> > > - Return first unassigned resource of the correct type.
> > > - If none of the above, return first assigned resource of the correct type.
> > > - If none of the above, return NULL.
> > > 
> > > Returning an assigned resource of the correct type allows the caller to
> > > distinguish between already assigned and no resource of the correct type.
> > > 
> > > Rename find_free_bus_resource to find_bus_resource_of_type().
> > > 
> > > Add checks in pbus_size_io() and pbus_size_mem() to return success if
> > > resource returned from find_free_bus_resource() is already allocated.
> > > 
> > > This avoids pbus_size_io() and pbus_size_mem() returning error code to
> > > __pci_bus_size_bridges() when a resource has been successfully assigned
> > > in a previous pass. This fixes the existing behaviour where space for a
> > > resource could be reserved multiple times in different parent bridge
> > > windows.
> > > 
> > > Link: https://lore.kernel.org/lkml/20190531171216.20532-2-logang@deltatee.com/T/#u
> > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=203243
> > > 
> > > Reported-by: Kit Chow <kchow@gigaio.com>
> > > Reported-by: Nicholas Johnson <nicholas.johnson-opensource@outlook.com.au>
> > > Signed-off-by: Nicholas Johnson <nicholas.johnson-opensource@outlook.com.au>
> > > Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
> > 
> > Applied with reviewed-by from Mika and Logan to pci/resource for v5.5,
> > thanks!
> 
> We have v5.4-rc8, so there is one more week. Please let me know if you 
> have any concerns about the other four patches so that I may address 
> them ASAP. If you are worried about the first one, I can re-post the 
> series with it at the end, so that the others can be taken first.

I assume you're talking about this:

  [PATCH v11 0/4] Patch series to assist Thunderbolt allocation with kernel parameters

I hope to merge those early in the next cycle so we get some time in
linux-next for wider testing.  It's later in the v5.5 cycle than I
would be comfortable with.

Bjorn
Nicholas Johnson Nov. 19, 2019, 3:17 a.m. UTC | #6
On Mon, Nov 18, 2019 at 04:58:02PM -0600, Bjorn Helgaas wrote:
> On Mon, Nov 18, 2019 at 09:43:34AM +0000, Nicholas Johnson wrote:
> > On Thu, Nov 14, 2019 at 10:56:37AM -0600, Bjorn Helgaas wrote:
> > > On Wed, Nov 13, 2019 at 03:25:28PM +0000, Nicholas Johnson wrote:
> > > > Currently, the kernel can sometimes assign the MMIO_PREF window
> > > > additional size into the MMIO window, resulting in extra MMIO additional
> > > > size, despite the MMIO_PREF additional size being assigned successfully
> > > > into the MMIO_PREF window.
> > > > 
> > > > This happens if in the first pass, the MMIO_PREF succeeds but the MMIO
> > > > fails. In the next pass, because MMIO_PREF is already assigned, the
> > > > attempt to assign MMIO_PREF returns an error code instead of success
> > > > (nothing more to do, already allocated). Hence, the size which is
> > > > actually allocated, but thought to have failed, is placed in the MMIO
> > > > window.
> > > > 
> > > > Example of problem (more context can be found in the bug report URL):
> > > > 
> > > > Mainline kernel:
> > > > pci 0000:06:01.0: BAR 14: assigned [mem 0x90100000-0xa00fffff] = 256M
> > > > pci 0000:06:04.0: BAR 14: assigned [mem 0xa0200000-0xb01fffff] = 256M
> > > > 
> > > > Patched kernel:
> > > > pci 0000:06:01.0: BAR 14: assigned [mem 0x90100000-0x980fffff] = 128M
> > > > pci 0000:06:04.0: BAR 14: assigned [mem 0x98200000-0xa01fffff] = 128M
> > > > 
> > > > This was using pci=realloc,hpmemsize=128M,nocrs - on the same machine
> > > > with the same configuration, with a Ubuntu mainline kernel and a kernel
> > > > patched with this patch.
> > > > 
> > > > The bug results in the MMIO_PREF being added to the MMIO window, which
> > > > means doubling if MMIO_PREF size = MMIO size. With a large MMIO_PREF,
> > > > the MMIO window will likely fail to be assigned altogether due to lack
> > > > of 32-bit address space.
> > > > 
> > > > Change find_free_bus_resource() to do the following:
> > > > - Return first unassigned resource of the correct type.
> > > > - If none of the above, return first assigned resource of the correct type.
> > > > - If none of the above, return NULL.
> > > > 
> > > > Returning an assigned resource of the correct type allows the caller to
> > > > distinguish between already assigned and no resource of the correct type.
> > > > 
> > > > Rename find_free_bus_resource to find_bus_resource_of_type().
> > > > 
> > > > Add checks in pbus_size_io() and pbus_size_mem() to return success if
> > > > resource returned from find_free_bus_resource() is already allocated.
> > > > 
> > > > This avoids pbus_size_io() and pbus_size_mem() returning error code to
> > > > __pci_bus_size_bridges() when a resource has been successfully assigned
> > > > in a previous pass. This fixes the existing behaviour where space for a
> > > > resource could be reserved multiple times in different parent bridge
> > > > windows.
> > > > 
> > > > Link: https://lore.kernel.org/lkml/20190531171216.20532-2-logang@deltatee.com/T/#u
> > > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=203243
> > > > 
> > > > Reported-by: Kit Chow <kchow@gigaio.com>
> > > > Reported-by: Nicholas Johnson <nicholas.johnson-opensource@outlook.com.au>
> > > > Signed-off-by: Nicholas Johnson <nicholas.johnson-opensource@outlook.com.au>
> > > > Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
> > > 
> > > Applied with reviewed-by from Mika and Logan to pci/resource for v5.5,
> > > thanks!
> > 
> > We have v5.4-rc8, so there is one more week. Please let me know if you 
> > have any concerns about the other four patches so that I may address 
> > them ASAP. If you are worried about the first one, I can re-post the 
> > series with it at the end, so that the others can be taken first.
> 
> I assume you're talking about this:
> 
>   [PATCH v11 0/4] Patch series to assist Thunderbolt allocation with kernel parameters
> 
> I hope to merge those early in the next cycle so we get some time in
> linux-next for wider testing.  It's later in the v5.5 cycle than I
> would be comfortable with.
> 
> Bjorn
Fair enough. Thanks for this info. :)

I did just discover linux-next and I built it. Should I be doing this 
more often to help find regressions?

I will now concentrate on fixing the problem where pci=nocrs does not 
ignore the bus resource. One motherboard I own gives 00-7e or similar, 
instead of 00-ff. The nocrs does not help, and I had to patch the kernel 
myself. Only acpi=off fixes the problem, while knocking out SMT (MADT), 
IOMMU (DMAR) and the ability to suspend without crashing.

If you disagree that nocrs should override bus resource, then let me 
know and I will not attempt this.

Nicholas
Bjorn Helgaas Nov. 19, 2019, 1:38 p.m. UTC | #7
On Tue, Nov 19, 2019 at 03:17:04AM +0000, Nicholas Johnson wrote:
> I did just discover linux-next and I built it. Should I be doing this 
> more often to help find regressions?

Yes, if you build and run linux-next, that's a great service because
it helps find problems before they appear in mainline.

> I will now concentrate on fixing the problem where pci=nocrs does not 
> ignore the bus resource. One motherboard I own gives 00-7e or similar, 
> instead of 00-ff. The nocrs does not help, and I had to patch the kernel 
> myself. Only acpi=off fixes the problem, while knocking out SMT (MADT), 
> IOMMU (DMAR) and the ability to suspend without crashing.
> 
> If you disagree that nocrs should override bus resource, then let me 
> know and I will not attempt this.

I guess the problem is that with "pci=nocrs", we ignore the MMIO and
I/O port resources from _CRS, but we still pay attention to bus number
resources in _CRS?  That does sound like it would be unexpected
behavior.

I *would* like to see the complete dmesg log because these _CRS
methods are pretty reliable because Windows relies on them as well, so
problems are frequently a result of Linux defects.  If we can fix
Linux or automatically work around issues so users don't have to use
"pci=nocrs" explicitly, that's the best.

Bjorn
Nicholas Johnson Nov. 21, 2019, 2:52 p.m. UTC | #8
On Tue, Nov 19, 2019 at 07:38:28AM -0600, Bjorn Helgaas wrote:
> On Tue, Nov 19, 2019 at 03:17:04AM +0000, Nicholas Johnson wrote:
> > I did just discover linux-next and I built it. Should I be doing this 
> > more often to help find regressions?
> 
> Yes, if you build and run linux-next, that's a great service because
> it helps find problems before they appear in mainline.

Funnily enough, I just built Linux next-20191121 and it has a NULL 
dereference on start-up, which renders the system unusable.

Can anybody else please confirm? I enabled most of the new options since 
the last linux-next a few days before.

I did just compile on an i7-4770K using my USB SSD to boot. I suppose 
there is a tiny chance that the CPU had an error and produced bad code. 
It is not my machine. It was pegged at 100 degrees Celsius the whole 
time.... I do find it hard to believe that I am the first to notice it, 
though. I cannot find any bug reports on this.

If this turns out to be an actual bug, is there a preferred way to 
report it? It is probably not from pci subsystem.

I can do a bisect, but they consume a lot of time on a slow system.

Here is a preliminary bug report (assuming you are meant to report 
linux-next bugs here):
https://bugzilla.kernel.org/show_bug.cgi?id=205621

Cheers!

Regards,
Nicholas Johnson

> 
> > I will now concentrate on fixing the problem where pci=nocrs does not 
> > ignore the bus resource. One motherboard I own gives 00-7e or similar, 
> > instead of 00-ff. The nocrs does not help, and I had to patch the kernel 
> > myself. Only acpi=off fixes the problem, while knocking out SMT (MADT), 
> > IOMMU (DMAR) and the ability to suspend without crashing.
> > 
> > If you disagree that nocrs should override bus resource, then let me 
> > know and I will not attempt this.
> 
> I guess the problem is that with "pci=nocrs", we ignore the MMIO and
> I/O port resources from _CRS, but we still pay attention to bus number
> resources in _CRS?  That does sound like it would be unexpected
> behavior.
> 
> I *would* like to see the complete dmesg log because these _CRS
> methods are pretty reliable because Windows relies on them as well, so
> problems are frequently a result of Linux defects.  If we can fix
> Linux or automatically work around issues so users don't have to use
> "pci=nocrs" explicitly, that's the best.
> 
> Bjorn
Bjorn Helgaas Nov. 21, 2019, 5:55 p.m. UTC | #9
[+cc Naresh]

On Thu, Nov 21, 2019 at 02:52:41PM +0000, Nicholas Johnson wrote:
> On Tue, Nov 19, 2019 at 07:38:28AM -0600, Bjorn Helgaas wrote:
> > On Tue, Nov 19, 2019 at 03:17:04AM +0000, Nicholas Johnson wrote:
> > > I did just discover linux-next and I built it. Should I be doing this 
> > > more often to help find regressions?
> > 
> > Yes, if you build and run linux-next, that's a great service because
> > it helps find problems before they appear in mainline.
> 
> Funnily enough, I just built Linux next-20191121 and it has a NULL 
> dereference on start-up, which renders the system unusable.
> 
> Can anybody else please confirm? I enabled most of the new options since 
> the last linux-next a few days before.
> ...

> Here is a preliminary bug report (assuming you are meant to report 
> linux-next bugs here):
> https://bugzilla.kernel.org/show_bug.cgi?id=205621

Looks similar to these reports I found by searching for kernfs_find_ns on
https://lore.kernel.org/lkml:

  https://lore.kernel.org/r/000000000000bdfc8e0597ddbde4@google.com
  https://lore.kernel.org/r/000000000000c1e9090597ddbd9d@google.com
  https://lore.kernel.org/r/CA+G9fYsrRw2SRb92eROCt6aU==j6Qr9Fe4AmJyn4fMj5gDFt=w@mail.gmail.com

Thanks very much for doing this testing and reporting the results.  I
don't think there's anything more you need to do.  It looks like
Naresh already cc'd some likely candidates.

Bjorn
diff mbox series

Patch

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 6b64bf909..f382f449b 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -752,24 +752,32 @@  static void pci_bridge_check_ranges(struct pci_bus *bus)
 }
 
 /*
- * Helper function for sizing routines: find first available bus resource
- * of a given type.  Note: we intentionally skip the bus resources which
- * have already been assigned (that is, have non-NULL parent resource).
+ * Helper function for sizing routines.
+ * Assigned resources have non-NULL parent resource.
+ *
+ * Return first unassigned resource of the correct type.
+ * If none of the above, return first assigned resource of the correct type.
+ * If none of the above, return NULL.
+ *
+ * Returning an assigned resource of the correct type allows the caller to
+ * distinguish between already assigned and no resource of the correct type.
  */
-static struct resource *find_free_bus_resource(struct pci_bus *bus,
-					       unsigned long type_mask,
-					       unsigned long type)
+static struct resource *find_bus_resource_of_type(struct pci_bus *bus,
+						  unsigned long type_mask,
+						  unsigned long type)
 {
+	struct resource *r, *r_assigned = NULL;
 	int i;
-	struct resource *r;
 
 	pci_bus_for_each_resource(bus, r, i) {
 		if (r == &ioport_resource || r == &iomem_resource)
 			continue;
 		if (r && (r->flags & type_mask) == type && !r->parent)
 			return r;
+		if (r && (r->flags & type_mask) == type && !r_assigned)
+			r_assigned = r;
 	}
-	return NULL;
+	return r_assigned;
 }
 
 static resource_size_t calculate_iosize(resource_size_t size,
@@ -866,8 +874,8 @@  static void pbus_size_io(struct pci_bus *bus, resource_size_t min_size,
 			 struct list_head *realloc_head)
 {
 	struct pci_dev *dev;
-	struct resource *b_res = find_free_bus_resource(bus, IORESOURCE_IO,
-							IORESOURCE_IO);
+	struct resource *b_res = find_bus_resource_of_type(bus, IORESOURCE_IO,
+								IORESOURCE_IO);
 	resource_size_t size = 0, size0 = 0, size1 = 0;
 	resource_size_t children_add_size = 0;
 	resource_size_t min_align, align;
@@ -875,6 +883,10 @@  static void pbus_size_io(struct pci_bus *bus, resource_size_t min_size,
 	if (!b_res)
 		return;
 
+	/* If resource is already assigned, nothing more to do. */
+	if (b_res->parent)
+		return;
+
 	min_align = window_alignment(bus, IORESOURCE_IO);
 	list_for_each_entry(dev, &bus->devices, bus_list) {
 		int i;
@@ -978,7 +990,7 @@  static int pbus_size_mem(struct pci_bus *bus, unsigned long mask,
 	resource_size_t min_align, align, size, size0, size1;
 	resource_size_t aligns[18]; /* Alignments from 1MB to 128GB */
 	int order, max_order;
-	struct resource *b_res = find_free_bus_resource(bus,
+	struct resource *b_res = find_bus_resource_of_type(bus,
 					mask | IORESOURCE_PREFETCH, type);
 	resource_size_t children_add_size = 0;
 	resource_size_t children_add_align = 0;
@@ -987,6 +999,10 @@  static int pbus_size_mem(struct pci_bus *bus, unsigned long mask,
 	if (!b_res)
 		return -ENOSPC;
 
+	/* If resource is already assigned, nothing more to do. */
+	if (b_res->parent)
+		return 0;
+
 	memset(aligns, 0, sizeof(aligns));
 	max_order = 0;
 	size = 0;