diff mbox series

[v7,17/26] PCI: hotplug: Ignore the MEM BAR offsets from BIOS/bootloader

Message ID 20200129152937.311162-18-s.miroshnichenko@yadro.com
State New
Headers show
Series PCI: Allow BAR movement during boot and hotplug | expand

Commit Message

Sergei Miroshnichenko Jan. 29, 2020, 3:29 p.m. UTC
BAR allocation by BIOS/UEFI/bootloader/firmware may be non-optimal and
it may even clash with the kernel's BAR assignment algorithm.

For example, sometimes BIOS doesn't reserve space for SR-IOV BARs, and
this bridge window can neither extend (blocked by immovable BARs) nor
move (the device itself is immovable).

With this patch the kernel will use its own methods of BAR allocating
when possible, increasing the chances of successful boot and hotplug.

Signed-off-by: Sergei Miroshnichenko <s.miroshnichenko@yadro.com>
---
 drivers/pci/probe.c | 16 ++++++++++++++--
 1 file changed, 14 insertions(+), 2 deletions(-)

Comments

Bjorn Helgaas Jan. 31, 2020, 8:31 p.m. UTC | #1
On Wed, Jan 29, 2020 at 06:29:28PM +0300, Sergei Miroshnichenko wrote:
> BAR allocation by BIOS/UEFI/bootloader/firmware may be non-optimal and
> it may even clash with the kernel's BAR assignment algorithm.
> 
> For example, sometimes BIOS doesn't reserve space for SR-IOV BARs, and
> this bridge window can neither extend (blocked by immovable BARs) nor
> move (the device itself is immovable).
> 
> With this patch the kernel will use its own methods of BAR allocating
> when possible, increasing the chances of successful boot and hotplug.
> 
> Signed-off-by: Sergei Miroshnichenko <s.miroshnichenko@yadro.com>
> ---
>  drivers/pci/probe.c | 16 ++++++++++++++--
>  1 file changed, 14 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> index bb584038d5b4..f8f643dac6d1 100644
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -306,6 +306,14 @@ int __pci_read_base(struct pci_dev *dev, enum pci_bar_type type,
>  			 pos, (unsigned long long)region.start);
>  	}
>  
> +	if (pci_can_move_bars && res->start && !(res->flags & IORESOURCE_IO)) {
> +		pci_warn(dev, "ignore the current offset of BAR %llx-%llx\n",
> +			 l64, l64 + sz64 - 1);
> +		res->start = 0;
> +		res->end = sz64 - 1;
> +		res->flags |= IORESOURCE_SIZEALIGN;
> +	}
> +
>  	goto out;
>  
>  
> @@ -528,8 +536,10 @@ void pci_read_bridge_bases(struct pci_bus *child)
>  		child->resource[i] = &dev->resource[PCI_BRIDGE_RESOURCES+i];
>  
>  	pci_read_bridge_io(child);
> -	pci_read_bridge_mmio(child);
> -	pci_read_bridge_mmio_pref(child);
> +	if (!pci_can_move_bars) {
> +		pci_read_bridge_mmio(child);
> +		pci_read_bridge_mmio_pref(child);
> +	}

I mentioned this in another response, but I'll repeat it here to
comment on the code directly: I don't think we should have feature
tests like this "!pci_can_move_bars" scattered around, and I don't
want basic behaviors like reading bridge windows during enumeration to
depend on it.

There's no obvious reason why we should ignore bridge windows if we
support movable BARs.

>  	if (dev->transparent) {
>  		pci_bus_for_each_resource(child->parent, res, i) {
> @@ -2945,6 +2955,8 @@ int pci_host_probe(struct pci_host_bridge *bridge)
>  		pci_bus_claim_resources(bus);
>  	} else {
>  		pci_bus_size_bridges(bus);
> +		if (pci_can_move_bars)
> +			pci_bus_update_realloc_range(bus);
>  		pci_bus_assign_resources(bus);
>  
>  		list_for_each_entry(child, &bus->children, node)
> -- 
> 2.24.1
>
Sergei Miroshnichenko Feb. 5, 2020, 11:01 a.m. UTC | #2
On Fri, 2020-01-31 at 14:31 -0600, Bjorn Helgaas wrote:
> On Wed, Jan 29, 2020 at 06:29:28PM +0300, Sergei Miroshnichenko
> wrote:
> > BAR allocation by BIOS/UEFI/bootloader/firmware may be non-optimal
> > and
> > it may even clash with the kernel's BAR assignment algorithm.
> > 
> > For example, sometimes BIOS doesn't reserve space for SR-IOV BARs,
> > and
> > this bridge window can neither extend (blocked by immovable BARs)
> > nor
> > move (the device itself is immovable).
> > 
> > With this patch the kernel will use its own methods of BAR
> > allocating
> > when possible, increasing the chances of successful boot and
> > hotplug.
> > 
> > Signed-off-by: Sergei Miroshnichenko <s.miroshnichenko@yadro.com>
> > ---
> >  drivers/pci/probe.c | 16 ++++++++++++++--
> >  1 file changed, 14 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> > index bb584038d5b4..f8f643dac6d1 100644
> > --- a/drivers/pci/probe.c
> > +++ b/drivers/pci/probe.c
> > @@ -306,6 +306,14 @@ int __pci_read_base(struct pci_dev *dev, enum
> > pci_bar_type type,
> >  			 pos, (unsigned long long)region.start);
> >  	}
> >  
> > +	if (pci_can_move_bars && res->start && !(res->flags &
> > IORESOURCE_IO)) {
> > +		pci_warn(dev, "ignore the current offset of BAR %llx-
> > %llx\n",
> > +			 l64, l64 + sz64 - 1);
> > +		res->start = 0;
> > +		res->end = sz64 - 1;
> > +		res->flags |= IORESOURCE_SIZEALIGN;
> > +	}
> > +
> >  	goto out;
> >  
> >  
> > @@ -528,8 +536,10 @@ void pci_read_bridge_bases(struct pci_bus
> > *child)
> >  		child->resource[i] = &dev-
> > >resource[PCI_BRIDGE_RESOURCES+i];
> >  
> >  	pci_read_bridge_io(child);
> > -	pci_read_bridge_mmio(child);
> > -	pci_read_bridge_mmio_pref(child);
> > +	if (!pci_can_move_bars) {
> > +		pci_read_bridge_mmio(child);
> > +		pci_read_bridge_mmio_pref(child);
> > +	}
> 
> I mentioned this in another response, but I'll repeat it here to
> comment on the code directly: I don't think we should have feature
> tests like this "!pci_can_move_bars" scattered around, and I don't
> want basic behaviors like reading bridge windows during enumeration
> to
> depend on it.
> 
> There's no obvious reason why we should ignore bridge windows if we
> support movable BARs.
> 


That patch solves a problem which is non-fatal during boot, but is
breaking this whole patchset when trying a PCI rescan. On a specific
machine we have a popular i350 network card:

$ lspci -tv
-[0000:00]-...
           +-01.1-[02]--+-00.0  Intel Corporation I350 Gigabit Network
Connection
           |            +-00.1  Intel Corporation I350 Gigabit Network
Connection
           |            +-00.2  Intel Corporation I350 Gigabit Network
Connection
           |            \-00.3  Intel Corporation I350 Gigabit Network
Connection

On the "ROG STRIX Z370-F GAMING, BIOS 0612 03/01/2018" motherboard and
vanilla kernel, not every BAR is allocated:

$ dmesg -t
  ...
  pci 0000:00:01.0: PCI bridge to [bus 01]
  pci 0000:00:01.0:   bridge window [mem 0xf7700000-0xf78fffff]
  pci 0000:02:00.0: BAR 7: assigned [mem 0xf7490000-0xf74affff]
  pci 0000:02:00.0: BAR 10: assigned [mem 0xf74b0000-0xf74cffff]
  pci 0000:02:00.1: BAR 7: assigned [mem 0xf74d0000-0xf74effff]
  pci 0000:02:00.1: BAR 10: no space for [mem size 0x00020000]
  pci 0000:02:00.1: BAR 10: failed to assign [mem size 0x00020000]
  pci 0000:02:00.2: BAR 7: no space for [mem size 0x00020000]
  pci 0000:02:00.2: BAR 7: failed to assign [mem size 0x00020000]
  pci 0000:02:00.2: BAR 10: no space for [mem size 0x00020000]
  pci 0000:02:00.2: BAR 10: failed to assign [mem size 0x00020000]
  pci 0000:02:00.3: BAR 7: no space for [mem size 0x00020000]
  pci 0000:02:00.3: BAR 7: failed to assign [mem size 0x00020000]
  pci 0000:02:00.3: BAR 10: no space for [mem size 0x00020000]
  pci 0000:02:00.3: BAR 10: failed to assign [mem size 0x00020000]
  pci 0000:00:01.1: PCI bridge to [bus 02]

$ sudo cat /proc/iomem
  ...
  f7000000-f74fffff : PCI Bus 0000:02
    f7000000-f70fffff : 0000:02:00.3
      f7000000-f70fffff : igb
    f7100000-f71fffff : 0000:02:00.2
      f7100000-f71fffff : igb
    f7200000-f72fffff : 0000:02:00.1
      f7200000-f72fffff : igb
    f7300000-f73fffff : 0000:02:00.0
      f7300000-f73fffff : igb
    f7400000-f747ffff : 0000:02:00.0
    f7480000-f7483fff : 0000:02:00.3
      f7480000-f7483fff : igb
    f7484000-f7487fff : 0000:02:00.2
      f7484000-f7487fff : igb
    f7488000-f748bfff : 0000:02:00.1
      f7488000-f748bfff : igb
    f748c000-f748ffff : 0000:02:00.0
      f748c000-f748ffff : igb
    f7490000-f74affff : 0000:02:00.0
    f74b0000-f74cffff : 0000:02:00.0
    f74d0000-f74effff : 0000:02:00.1

But when allowing the kernel to allocate BARs properly, the map is
full:

  c8200000-c87fffff : PCI Bus 0000:02
    c8200000-c82fffff : 0000:02:00.0
      c8200000-c82fffff : igb
    c8300000-c83fffff : 0000:02:00.1
      c8300000-c83fffff : igb
    c8400000-c84fffff : 0000:02:00.2
      c8400000-c84fffff : igb
    c8500000-c85fffff : 0000:02:00.3
      c8500000-c85fffff : igb
    c8600000-c867ffff : 0000:02:00.0
    c8680000-c8683fff : 0000:02:00.0
      c8680000-c8683fff : igb
    c8684000-c86a3fff : 0000:02:00.0
    c86a4000-c86c3fff : 0000:02:00.0
    c86c4000-c86c7fff : 0000:02:00.1
      c86c4000-c86c7fff : igb
    c86c8000-c86e7fff : 0000:02:00.1
    c86e8000-c8707fff : 0000:02:00.1
    c8708000-c870bfff : 0000:02:00.2
      c8708000-c870bfff : igb
    c870c000-c872bfff : 0000:02:00.2
    c872c000-c874bfff : 0000:02:00.2
    c874c000-c874ffff : 0000:02:00.3
      c874c000-c874ffff : igb
    c8750000-c876ffff : 0000:02:00.3
    c8770000-c878ffff : 0000:02:00.3

In this particular case the "repaired" BARs are not vital and are not
used by the igb driver, but in general such behavior of BIOS can lead
to a non-working setup.

So ignoring pre-set BARs and bridge windows may be useful on its own,
but it is also provides a working starting point required by this
patchset, otherwise it will need to track such BARs impossible to
assign, and don't try to assign them during a next PCI rescan.

The reason I've tied this feature to the "movable BARs" flag is that I
know at least one exception demanding a workaround - VGA. So I wanted
to provide a flag to disable it in case of other unforeseen
consequences, and the only feature depends on this - is movable BARs.

The [07/26] "PCI: hotplug: Don't allow hot-added devices to steal
resources" patch introduces an additional step in BAR assignment:
 - try to assign every existing BAR + BARs of the hot-added device;
 - it if fails, disable BARs for the hot-added device and retry without
   them.

A possible way to work-around non-working BARs could be adding more
steps:
 - first try to assign every existing BAR + BARs not worked previously
   + "hot-added" BARs;
 - if it fails, retry without those BARs which weren't working before;
 - if it still fails, retry without "hot-added" BARs.

Best regards,
Serge

> >  	if (dev->transparent) {
> >  		pci_bus_for_each_resource(child->parent, res, i) {
> > @@ -2945,6 +2955,8 @@ int pci_host_probe(struct pci_host_bridge
> > *bridge)
> >  		pci_bus_claim_resources(bus);
> >  	} else {
> >  		pci_bus_size_bridges(bus);
> > +		if (pci_can_move_bars)
> > +			pci_bus_update_realloc_range(bus);
> >  		pci_bus_assign_resources(bus);
> >  
> >  		list_for_each_entry(child, &bus->children, node)
> > -- 
> > 2.24.1
> >
Bjorn Helgaas Feb. 5, 2020, 4:42 p.m. UTC | #3
On Wed, Feb 05, 2020 at 11:01:32AM +0000, Sergei Miroshnichenko wrote:
> On Fri, 2020-01-31 at 14:31 -0600, Bjorn Helgaas wrote:
> > On Wed, Jan 29, 2020 at 06:29:28PM +0300, Sergei Miroshnichenko
> > wrote:
> > > BAR allocation by BIOS/UEFI/bootloader/firmware may be
> > > non-optimal and it may even clash with the kernel's BAR
> > > assignment algorithm.
> > > 
> > > For example, sometimes BIOS doesn't reserve space for SR-IOV
> > > BARs, and this bridge window can neither extend (blocked by
> > > immovable BARs) nor move (the device itself is immovable).
> > > 
> > > With this patch the kernel will use its own methods of BAR
> > > allocating when possible, increasing the chances of successful
> > > boot and hotplug.
> > > 
> > > Signed-off-by: Sergei Miroshnichenko <s.miroshnichenko@yadro.com>
> > > ---
> > >  drivers/pci/probe.c | 16 ++++++++++++++--
> > >  1 file changed, 14 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> > > index bb584038d5b4..f8f643dac6d1 100644
> > > --- a/drivers/pci/probe.c
> > > +++ b/drivers/pci/probe.c
> > > @@ -306,6 +306,14 @@ int __pci_read_base(struct pci_dev *dev, enum
> > > pci_bar_type type,
> > >  			 pos, (unsigned long long)region.start);
> > >  	}
> > >  
> > > +	if (pci_can_move_bars && res->start && !(res->flags &
> > > IORESOURCE_IO)) {
> > > +		pci_warn(dev, "ignore the current offset of BAR %llx-
> > > %llx\n",
> > > +			 l64, l64 + sz64 - 1);
> > > +		res->start = 0;
> > > +		res->end = sz64 - 1;
> > > +		res->flags |= IORESOURCE_SIZEALIGN;
> > > +	}
> > > +
> > >  	goto out;
> > >  
> > >  
> > > @@ -528,8 +536,10 @@ void pci_read_bridge_bases(struct pci_bus
> > > *child)
> > >  		child->resource[i] = &dev-
> > > >resource[PCI_BRIDGE_RESOURCES+i];
> > >  
> > >  	pci_read_bridge_io(child);
> > > -	pci_read_bridge_mmio(child);
> > > -	pci_read_bridge_mmio_pref(child);
> > > +	if (!pci_can_move_bars) {
> > > +		pci_read_bridge_mmio(child);
> > > +		pci_read_bridge_mmio_pref(child);
> > > +	}
> > 
> > I mentioned this in another response, but I'll repeat it here to
> > comment on the code directly: I don't think we should have feature
> > tests like this "!pci_can_move_bars" scattered around, and I don't
> > want basic behaviors like reading bridge windows during enumeration
> > to
> > depend on it.
> > 
> > There's no obvious reason why we should ignore bridge windows if we
> > support movable BARs.
> 
> That patch solves a problem which is non-fatal during boot, but is
> breaking this whole patchset when trying a PCI rescan. On a specific
> machine we have a popular i350 network card:
> 
> $ lspci -tv
> -[0000:00]-...
>            +-01.1-[02]--+-00.0  Intel Corporation I350 Gigabit Network
> Connection
>            |            +-00.1  Intel Corporation I350 Gigabit Network
> Connection
>            |            +-00.2  Intel Corporation I350 Gigabit Network
> Connection
>            |            \-00.3  Intel Corporation I350 Gigabit Network
> Connection
> 
> On the "ROG STRIX Z370-F GAMING, BIOS 0612 03/01/2018" motherboard and
> vanilla kernel, not every BAR is allocated:
> 
> $ dmesg -t
>   ...
>   pci 0000:00:01.0: PCI bridge to [bus 01]
>   pci 0000:00:01.0:   bridge window [mem 0xf7700000-0xf78fffff]
>   pci 0000:02:00.0: BAR 7: assigned [mem 0xf7490000-0xf74affff]
>   pci 0000:02:00.0: BAR 10: assigned [mem 0xf74b0000-0xf74cffff]
>   pci 0000:02:00.1: BAR 7: assigned [mem 0xf74d0000-0xf74effff]
>   pci 0000:02:00.1: BAR 10: no space for [mem size 0x00020000]
>   pci 0000:02:00.1: BAR 10: failed to assign [mem size 0x00020000]
>   pci 0000:02:00.2: BAR 7: no space for [mem size 0x00020000]
>   pci 0000:02:00.2: BAR 7: failed to assign [mem size 0x00020000]
>   pci 0000:02:00.2: BAR 10: no space for [mem size 0x00020000]
>   pci 0000:02:00.2: BAR 10: failed to assign [mem size 0x00020000]
>   pci 0000:02:00.3: BAR 7: no space for [mem size 0x00020000]
>   pci 0000:02:00.3: BAR 7: failed to assign [mem size 0x00020000]
>   pci 0000:02:00.3: BAR 10: no space for [mem size 0x00020000]
>   pci 0000:02:00.3: BAR 10: failed to assign [mem size 0x00020000]
>   pci 0000:00:01.1: PCI bridge to [bus 02]
> 
> $ sudo cat /proc/iomem
>   ...
>   f7000000-f74fffff : PCI Bus 0000:02
>     f7000000-f70fffff : 0000:02:00.3
>       f7000000-f70fffff : igb
>     f7100000-f71fffff : 0000:02:00.2
>       f7100000-f71fffff : igb
>     f7200000-f72fffff : 0000:02:00.1
>       f7200000-f72fffff : igb
>     f7300000-f73fffff : 0000:02:00.0
>       f7300000-f73fffff : igb
>     f7400000-f747ffff : 0000:02:00.0
>     f7480000-f7483fff : 0000:02:00.3
>       f7480000-f7483fff : igb
>     f7484000-f7487fff : 0000:02:00.2
>       f7484000-f7487fff : igb
>     f7488000-f748bfff : 0000:02:00.1
>       f7488000-f748bfff : igb
>     f748c000-f748ffff : 0000:02:00.0
>       f748c000-f748ffff : igb
>     f7490000-f74affff : 0000:02:00.0
>     f74b0000-f74cffff : 0000:02:00.0
>     f74d0000-f74effff : 0000:02:00.1
> 
> But when allowing the kernel to allocate BARs properly, the map is
> full:
> 
>   c8200000-c87fffff : PCI Bus 0000:02
>     c8200000-c82fffff : 0000:02:00.0
>       c8200000-c82fffff : igb
>     c8300000-c83fffff : 0000:02:00.1
>       c8300000-c83fffff : igb
>     c8400000-c84fffff : 0000:02:00.2
>       c8400000-c84fffff : igb
>     c8500000-c85fffff : 0000:02:00.3
>       c8500000-c85fffff : igb
>     c8600000-c867ffff : 0000:02:00.0
>     c8680000-c8683fff : 0000:02:00.0
>       c8680000-c8683fff : igb
>     c8684000-c86a3fff : 0000:02:00.0
>     c86a4000-c86c3fff : 0000:02:00.0
>     c86c4000-c86c7fff : 0000:02:00.1
>       c86c4000-c86c7fff : igb
>     c86c8000-c86e7fff : 0000:02:00.1
>     c86e8000-c8707fff : 0000:02:00.1
>     c8708000-c870bfff : 0000:02:00.2
>       c8708000-c870bfff : igb
>     c870c000-c872bfff : 0000:02:00.2
>     c872c000-c874bfff : 0000:02:00.2
>     c874c000-c874ffff : 0000:02:00.3
>       c874c000-c874ffff : igb
>     c8750000-c876ffff : 0000:02:00.3
>     c8770000-c878ffff : 0000:02:00.3
> 
> In this particular case the "repaired" BARs are not vital and are not
> used by the igb driver, but in general such behavior of BIOS can lead
> to a non-working setup.
> 
> So ignoring pre-set BARs and bridge windows may be useful on its own,
> but it is also provides a working starting point required by this
> patchset, otherwise it will need to track such BARs impossible to
> assign, and don't try to assign them during a next PCI rescan.
> 
> The reason I've tied this feature to the "movable BARs" flag is that I
> know at least one exception demanding a workaround - VGA. So I wanted
> to provide a flag to disable it in case of other unforeseen
> consequences, and the only feature depends on this - is movable BARs.

If we need exceptions for broken or legacy devices, we should check
for those explicitly.

If we fail to assign some BARs at boot, I think it's reasonable to try
to reassign things before drivers claim the devices.  But support for
that should be in a reassignment path, not in the general enumeration
path.  And, since drivers aren't involved yet, it probably doesn't
even depend on pci_can_move_bars.

> The [07/26] "PCI: hotplug: Don't allow hot-added devices to steal
> resources" patch introduces an additional step in BAR assignment:
>  - try to assign every existing BAR + BARs of the hot-added device;
>  - it if fails, disable BARs for the hot-added device and retry without
>    them.
> 
> A possible way to work-around non-working BARs could be adding more
> steps:
>  - first try to assign every existing BAR + BARs not worked previously
>    + "hot-added" BARs;
>  - if it fails, retry without those BARs which weren't working before;
>  - if it still fails, retry without "hot-added" BARs.
> 
> Best regards,
> Serge
> 
> > >  	if (dev->transparent) {
> > >  		pci_bus_for_each_resource(child->parent, res, i) {
> > > @@ -2945,6 +2955,8 @@ int pci_host_probe(struct pci_host_bridge
> > > *bridge)
> > >  		pci_bus_claim_resources(bus);
> > >  	} else {
> > >  		pci_bus_size_bridges(bus);
> > > +		if (pci_can_move_bars)
> > > +			pci_bus_update_realloc_range(bus);
> > >  		pci_bus_assign_resources(bus);
> > >  
> > >  		list_for_each_entry(child, &bus->children, node)
> > > -- 
> > > 2.24.1
> > >
Sergei Miroshnichenko Feb. 12, 2020, 12:29 p.m. UTC | #4
On Wed, 2020-02-05 at 10:42 -0600, Bjorn Helgaas wrote:
> On Wed, Feb 05, 2020 at 11:01:32AM +0000, Sergei Miroshnichenko
> wrote:
> > On Fri, 2020-01-31 at 14:31 -0600, Bjorn Helgaas wrote:
> > > On Wed, Jan 29, 2020 at 06:29:28PM +0300, Sergei Miroshnichenko
> > > wrote:
> > > > BAR allocation by BIOS/UEFI/bootloader/firmware may be
> > > > non-optimal and it may even clash with the kernel's BAR
> > > > assignment algorithm.
> > > > 
> > > > For example, sometimes BIOS doesn't reserve space for SR-IOV
> > > > BARs, and this bridge window can neither extend (blocked by
> > > > immovable BARs) nor move (the device itself is immovable).
> > > > 
> > > > With this patch the kernel will use its own methods of BAR
> > > > allocating when possible, increasing the chances of successful
> > > > boot and hotplug.
> > > > 
> > > > Signed-off-by: Sergei Miroshnichenko <
> > > > s.miroshnichenko@yadro.com>
> > > > ---
> > > >  drivers/pci/probe.c | 16 ++++++++++++++--
> > > >  1 file changed, 14 insertions(+), 2 deletions(-)
> > > > 
> > > > diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> > > > index bb584038d5b4..f8f643dac6d1 100644
> > > > --- a/drivers/pci/probe.c
> > > > +++ b/drivers/pci/probe.c
> > > > @@ -306,6 +306,14 @@ int __pci_read_base(struct pci_dev *dev,
> > > > enum
> > > > pci_bar_type type,
> > > >  			 pos, (unsigned long
> > > > long)region.start);
> > > >  	}
> > > >  
> > > > +	if (pci_can_move_bars && res->start && !(res->flags &
> > > > IORESOURCE_IO)) {
> > > > +		pci_warn(dev, "ignore the current offset of BAR
> > > > %llx-
> > > > %llx\n",
> > > > +			 l64, l64 + sz64 - 1);
> > > > +		res->start = 0;
> > > > +		res->end = sz64 - 1;
> > > > +		res->flags |= IORESOURCE_SIZEALIGN;
> > > > +	}
> > > > +
> > > >  	goto out;
> > > >  
> > > >  
> > > > @@ -528,8 +536,10 @@ void pci_read_bridge_bases(struct pci_bus
> > > > *child)
> > > >  		child->resource[i] = &dev-
> > > > > resource[PCI_BRIDGE_RESOURCES+i];
> > > >  
> > > >  	pci_read_bridge_io(child);
> > > > -	pci_read_bridge_mmio(child);
> > > > -	pci_read_bridge_mmio_pref(child);
> > > > +	if (!pci_can_move_bars) {
> > > > +		pci_read_bridge_mmio(child);
> > > > +		pci_read_bridge_mmio_pref(child);
> > > > +	}
> > > 
> > > I mentioned this in another response, but I'll repeat it here to
> > > comment on the code directly: I don't think we should have
> > > feature
> > > tests like this "!pci_can_move_bars" scattered around, and I
> > > don't
> > > want basic behaviors like reading bridge windows during
> > > enumeration
> > > to
> > > depend on it.
> > > 
> > > There's no obvious reason why we should ignore bridge windows if
> > > we
> > > support movable BARs.
> > 
> > That patch solves a problem which is non-fatal during boot, but is
> > breaking this whole patchset when trying a PCI rescan. On a
> > specific
> > machine we have a popular i350 network card:
> > 
> > $ lspci -tv
> > -[0000:00]-...
> >            +-01.1-[02]--+-00.0  Intel Corporation I350 Gigabit
> > Network
> > Connection
> >            |            +-00.1  Intel Corporation I350 Gigabit
> > Network
> > Connection
> >            |            +-00.2  Intel Corporation I350 Gigabit
> > Network
> > Connection
> >            |            \-00.3  Intel Corporation I350 Gigabit
> > Network
> > Connection
> > 
> > On the "ROG STRIX Z370-F GAMING, BIOS 0612 03/01/2018" motherboard
> > and
> > vanilla kernel, not every BAR is allocated:
> > 
> > $ dmesg -t
> >   ...
> >   pci 0000:00:01.0: PCI bridge to [bus 01]
> >   pci 0000:00:01.0:   bridge window [mem 0xf7700000-0xf78fffff]
> >   pci 0000:02:00.0: BAR 7: assigned [mem 0xf7490000-0xf74affff]
> >   pci 0000:02:00.0: BAR 10: assigned [mem 0xf74b0000-0xf74cffff]
> >   pci 0000:02:00.1: BAR 7: assigned [mem 0xf74d0000-0xf74effff]
> >   pci 0000:02:00.1: BAR 10: no space for [mem size 0x00020000]
> >   pci 0000:02:00.1: BAR 10: failed to assign [mem size 0x00020000]
> >   pci 0000:02:00.2: BAR 7: no space for [mem size 0x00020000]
> >   pci 0000:02:00.2: BAR 7: failed to assign [mem size 0x00020000]
> >   pci 0000:02:00.2: BAR 10: no space for [mem size 0x00020000]
> >   pci 0000:02:00.2: BAR 10: failed to assign [mem size 0x00020000]
> >   pci 0000:02:00.3: BAR 7: no space for [mem size 0x00020000]
> >   pci 0000:02:00.3: BAR 7: failed to assign [mem size 0x00020000]
> >   pci 0000:02:00.3: BAR 10: no space for [mem size 0x00020000]
> >   pci 0000:02:00.3: BAR 10: failed to assign [mem size 0x00020000]
> >   pci 0000:00:01.1: PCI bridge to [bus 02]
> > 
> > $ sudo cat /proc/iomem
> >   ...
> >   f7000000-f74fffff : PCI Bus 0000:02
> >     f7000000-f70fffff : 0000:02:00.3
> >       f7000000-f70fffff : igb
> >     f7100000-f71fffff : 0000:02:00.2
> >       f7100000-f71fffff : igb
> >     f7200000-f72fffff : 0000:02:00.1
> >       f7200000-f72fffff : igb
> >     f7300000-f73fffff : 0000:02:00.0
> >       f7300000-f73fffff : igb
> >     f7400000-f747ffff : 0000:02:00.0
> >     f7480000-f7483fff : 0000:02:00.3
> >       f7480000-f7483fff : igb
> >     f7484000-f7487fff : 0000:02:00.2
> >       f7484000-f7487fff : igb
> >     f7488000-f748bfff : 0000:02:00.1
> >       f7488000-f748bfff : igb
> >     f748c000-f748ffff : 0000:02:00.0
> >       f748c000-f748ffff : igb
> >     f7490000-f74affff : 0000:02:00.0
> >     f74b0000-f74cffff : 0000:02:00.0
> >     f74d0000-f74effff : 0000:02:00.1
> > 
> > But when allowing the kernel to allocate BARs properly, the map is
> > full:
> > 
> >   c8200000-c87fffff : PCI Bus 0000:02
> >     c8200000-c82fffff : 0000:02:00.0
> >       c8200000-c82fffff : igb
> >     c8300000-c83fffff : 0000:02:00.1
> >       c8300000-c83fffff : igb
> >     c8400000-c84fffff : 0000:02:00.2
> >       c8400000-c84fffff : igb
> >     c8500000-c85fffff : 0000:02:00.3
> >       c8500000-c85fffff : igb
> >     c8600000-c867ffff : 0000:02:00.0
> >     c8680000-c8683fff : 0000:02:00.0
> >       c8680000-c8683fff : igb
> >     c8684000-c86a3fff : 0000:02:00.0
> >     c86a4000-c86c3fff : 0000:02:00.0
> >     c86c4000-c86c7fff : 0000:02:00.1
> >       c86c4000-c86c7fff : igb
> >     c86c8000-c86e7fff : 0000:02:00.1
> >     c86e8000-c8707fff : 0000:02:00.1
> >     c8708000-c870bfff : 0000:02:00.2
> >       c8708000-c870bfff : igb
> >     c870c000-c872bfff : 0000:02:00.2
> >     c872c000-c874bfff : 0000:02:00.2
> >     c874c000-c874ffff : 0000:02:00.3
> >       c874c000-c874ffff : igb
> >     c8750000-c876ffff : 0000:02:00.3
> >     c8770000-c878ffff : 0000:02:00.3
> > 
> > In this particular case the "repaired" BARs are not vital and are
> > not
> > used by the igb driver, but in general such behavior of BIOS can
> > lead
> > to a non-working setup.
> > 
> > So ignoring pre-set BARs and bridge windows may be useful on its
> > own,
> > but it is also provides a working starting point required by this
> > patchset, otherwise it will need to track such BARs impossible to
> > assign, and don't try to assign them during a next PCI rescan.
> > 
> > The reason I've tied this feature to the "movable BARs" flag is
> > that I
> > know at least one exception demanding a workaround - VGA. So I
> > wanted
> > to provide a flag to disable it in case of other unforeseen
> > consequences, and the only feature depends on this - is movable
> > BARs.
> 
> If we need exceptions for broken or legacy devices, we should check
> for those explicitly.
> 
> If we fail to assign some BARs at boot, I think it's reasonable to
> try
> to reassign things before drivers claim the devices.  But support for
> that should be in a reassignment path, not in the general enumeration
> path.  And, since drivers aren't involved yet, it probably doesn't
> even depend on pci_can_move_bars.
> 

Thanks for the advice, it worked! Now I'm validating in the end of
__init pci_assign_unassigned_resources() that every device got desired
BARs, otherwise call to pci_rescan_bus(), which is allowed to move BARs
- indeed not yet captured by drivers at this stage. This will be
submitted in v8.

Serge

> > The [07/26] "PCI: hotplug: Don't allow hot-added devices to steal
> > resources" patch introduces an additional step in BAR assignment:
> >  - try to assign every existing BAR + BARs of the hot-added device;
> >  - it if fails, disable BARs for the hot-added device and retry
> > without
> >    them.
> > 
> > A possible way to work-around non-working BARs could be adding more
> > steps:
> >  - first try to assign every existing BAR + BARs not worked
> > previously
> >    + "hot-added" BARs;
> >  - if it fails, retry without those BARs which weren't working
> > before;
> >  - if it still fails, retry without "hot-added" BARs.
> > 
> > Best regards,
> > Serge
> > 
> > > >  	if (dev->transparent) {
> > > >  		pci_bus_for_each_resource(child->parent, res,
> > > > i) {
> > > > @@ -2945,6 +2955,8 @@ int pci_host_probe(struct pci_host_bridge
> > > > *bridge)
> > > >  		pci_bus_claim_resources(bus);
> > > >  	} else {
> > > >  		pci_bus_size_bridges(bus);
> > > > +		if (pci_can_move_bars)
> > > > +			pci_bus_update_realloc_range(bus);
> > > >  		pci_bus_assign_resources(bus);
> > > >  
> > > >  		list_for_each_entry(child, &bus->children,
> > > > node)
> > > > -- 
> > > > 2.24.1
> > > >
diff mbox series

Patch

diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index bb584038d5b4..f8f643dac6d1 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -306,6 +306,14 @@  int __pci_read_base(struct pci_dev *dev, enum pci_bar_type type,
 			 pos, (unsigned long long)region.start);
 	}
 
+	if (pci_can_move_bars && res->start && !(res->flags & IORESOURCE_IO)) {
+		pci_warn(dev, "ignore the current offset of BAR %llx-%llx\n",
+			 l64, l64 + sz64 - 1);
+		res->start = 0;
+		res->end = sz64 - 1;
+		res->flags |= IORESOURCE_SIZEALIGN;
+	}
+
 	goto out;
 
 
@@ -528,8 +536,10 @@  void pci_read_bridge_bases(struct pci_bus *child)
 		child->resource[i] = &dev->resource[PCI_BRIDGE_RESOURCES+i];
 
 	pci_read_bridge_io(child);
-	pci_read_bridge_mmio(child);
-	pci_read_bridge_mmio_pref(child);
+	if (!pci_can_move_bars) {
+		pci_read_bridge_mmio(child);
+		pci_read_bridge_mmio_pref(child);
+	}
 
 	if (dev->transparent) {
 		pci_bus_for_each_resource(child->parent, res, i) {
@@ -2945,6 +2955,8 @@  int pci_host_probe(struct pci_host_bridge *bridge)
 		pci_bus_claim_resources(bus);
 	} else {
 		pci_bus_size_bridges(bus);
+		if (pci_can_move_bars)
+			pci_bus_update_realloc_range(bus);
 		pci_bus_assign_resources(bus);
 
 		list_for_each_entry(child, &bus->children, node)