diff mbox series

PCI: Release coalesced resource

Message ID 20230525153248.712779-1-ross.lagerwall@citrix.com
State New
Headers show
Series PCI: Release coalesced resource | expand

Commit Message

Ross Lagerwall May 25, 2023, 3:32 p.m. UTC
When contiguous windows are coalesced, the resource is invalidated and
consequently not added to the bus. However, it remains in the resource
hierarchy:

...
  ef2fff00-ef2fffff : 0000:00:13.2
    ef2fff00-ef2fffff : ehci_hcd
00000000-00000000 : PCI Bus 0000:00
f0000000-f3ffffff : PCI MMCONFIG 0000 [bus 00-3f]
  f0000000-f3ffffff : Reserved
...

In some cases (e.g. the Xen scratch region), this causes future calls to
allocate_resource() to choose an inappropriate location which the caller
cannot handle. Fix by releasing the resource and removing from the
hierarchy.

Fixes: 7c3855c423b1 ("PCI: Coalesce host bridge contiguous apertures")
Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
---
 drivers/pci/probe.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

Comments

Ross Lagerwall June 5, 2023, 8:44 a.m. UTC | #1
> From: Ross Lagerwall <ross.lagerwall@citrix.com>
> Sent: Thursday, May 25, 2023 4:32 PM
> To: linux-pci@vger.kernel.org <linux-pci@vger.kernel.org>
> Cc: Bjorn Helgaas <bhelgaas@google.com>; Kai-Heng Feng <kai.heng.feng@canonical.com>; linux-kernel@vger.kernel.org <linux-kernel@vger.kernel.org>; Ross Lagerwall <ross.lagerwall@citrix.com>
> Subject: [PATCH] PCI: Release coalesced resource 
>  
> When contiguous windows are coalesced, the resource is invalidated and
> consequently not added to the bus. However, it remains in the resource
> hierarchy:
> 
> ...
>   ef2fff00-ef2fffff : 0000:00:13.2
>     ef2fff00-ef2fffff : ehci_hcd
> 00000000-00000000 : PCI Bus 0000:00
> f0000000-f3ffffff : PCI MMCONFIG 0000 [bus 00-3f]
>   f0000000-f3ffffff : Reserved
> ...
> 
> In some cases (e.g. the Xen scratch region), this causes future calls to
> allocate_resource() to choose an inappropriate location which the caller
> cannot handle. Fix by releasing the resource and removing from the
> hierarchy.
> 
> Fixes: 7c3855c423b1 ("PCI: Coalesce host bridge contiguous apertures")
> Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
> ---
>  drivers/pci/probe.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> index 0b2826c4a832..00ed20ac0dd6 100644
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -997,8 +997,10 @@ static int pci_register_host_bridge(struct pci_host_bridge *bridge)
>          resource_list_for_each_entry_safe(window, n, &resources) {
>                  offset = window->offset;
>                  res = window->res;
> -               if (!res->flags && !res->start && !res->end)
> +               if (!res->flags && !res->start && !res->end) {
> +                       release_resource(res);
>                          continue;
> +               }
>  
>                  list_move_tail(&window->node, &bridge->windows);
>  
> -- 
> 2.31.1

Ping?
Bjorn Helgaas June 6, 2023, 10:36 p.m. UTC | #2
On Thu, May 25, 2023 at 04:32:48PM +0100, Ross Lagerwall wrote:
> When contiguous windows are coalesced, the resource is invalidated and
> consequently not added to the bus. However, it remains in the resource
> hierarchy:
> 
> ...
>   ef2fff00-ef2fffff : 0000:00:13.2
>     ef2fff00-ef2fffff : ehci_hcd
> 00000000-00000000 : PCI Bus 0000:00
> f0000000-f3ffffff : PCI MMCONFIG 0000 [bus 00-3f]
>   f0000000-f3ffffff : Reserved
> ...

I assume the "00000000-00000000 : PCI Bus 0000:00" is the problematic
part?  Is there anything in dmesg that shows the resources before they
were coalesced?

Is there an error message we could include here to link the problem
with the solution?

> In some cases (e.g. the Xen scratch region), this causes future calls to
> allocate_resource() to choose an inappropriate location which the caller
> cannot handle. Fix by releasing the resource and removing from the
> hierarchy.
> 
> Fixes: 7c3855c423b1 ("PCI: Coalesce host bridge contiguous apertures")

7c3855c423b1 appeared in v5.16, so we may need a stable tag?

> Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
> ---
>  drivers/pci/probe.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> index 0b2826c4a832..00ed20ac0dd6 100644
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -997,8 +997,10 @@ static int pci_register_host_bridge(struct pci_host_bridge *bridge)
>  	resource_list_for_each_entry_safe(window, n, &resources) {
>  		offset = window->offset;
>  		res = window->res;
> -		if (!res->flags && !res->start && !res->end)
> +		if (!res->flags && !res->start && !res->end) {
> +			release_resource(res);
>  			continue;
> +		}
>  
>  		list_move_tail(&window->node, &bridge->windows);
>  
> -- 
> 2.31.1
>
Ross Lagerwall June 9, 2023, 3:39 p.m. UTC | #3
> From: Bjorn Helgaas <helgaas@kernel.org>
> Sent: Tuesday, June 6, 2023 11:36 PM
> To: Ross Lagerwall <ross.lagerwall@citrix.com>
> Cc: linux-pci@vger.kernel.org <linux-pci@vger.kernel.org>; Bjorn Helgaas <bhelgaas@google.com>; Kai-Heng Feng <kai.heng.feng@canonical.com>; linux-kernel@vger.kernel.org <linux-kernel@vger.kernel.org>
> Subject: Re: [PATCH] PCI: Release coalesced resource 
>  
> [CAUTION - EXTERNAL EMAIL] DO NOT reply, click links, or open attachments unless you have verified the sender and know the content is safe.
> 
> On Thu, May 25, 2023 at 04:32:48PM +0100, Ross Lagerwall wrote:
> > When contiguous windows are coalesced, the resource is invalidated and
> > consequently not added to the bus. However, it remains in the resource
> > hierarchy:
> > 
> > ...
> >   ef2fff00-ef2fffff : 0000:00:13.2
> >     ef2fff00-ef2fffff : ehci_hcd
> > 00000000-00000000 : PCI Bus 0000:00
> > f0000000-f3ffffff : PCI MMCONFIG 0000 [bus 00-3f]
> >   f0000000-f3ffffff : Reserved
> > ...
> 
> I assume the "00000000-00000000 : PCI Bus 0000:00" is the problematic
> part?  Is there anything in dmesg that shows the resources before they
> were coalesced?

Yes, that is the problematic part which gets removed by this patch.

dmesg doesn't show the resources before they were coalesced, but I
captured the output of /proc/iomem with/without the coalesce patch
to see what was being coalesced.

Without coalescing, this region ...

  fec00000-fec7ffff : PCI Bus 0000:00
    fec00000-fec003ff : IOAPIC 0
  fec80000-fecbffff : PCI Bus 0000:00
    fec80000-fec803ff : IOAPIC 1
    fec90000-fec93fff : pnp 00:06
	
... gets coalesced into:

  fec00000-fecbffff : PCI Bus 0000:00
    fec00000-fec003ff : IOAPIC 0
    fec80000-fec803ff : IOAPIC 1
    fec90000-fec93fff : pnp 00:06

> 
> Is there an error message we could include here to link the problem
> with the solution?

The error shows two "clipped" messages followed by a BUG when starting a VM under Xen.
Having said that, I don't think the error is specific to Xen - it just doesn't
handle getting back an unexpected resource range.

[ 2783.654292] clipped [mem 0x100000000-0x3fffffffffff] to [mem 0x230a07000-0x3fffffffffff] for e820 entry [mem 0x100000000-0x230a06fff]
[ 2783.654311] clipped [mem 0x230a07000-0x3fffffffffff] to [mem 0x10000000000-0x3fffffffffff] for e820 entry [mem 0xfd00000000-0xffffffffff]
[ 2783.710864] memmap_init_zone_device initialised 32768 pages in 0ms
[ 2783.711124] ------------[ cut here ]------------
[ 2783.711127] kernel BUG at arch/x86/xen/p2m.c:542!
[ 2783.711166] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
[ 2783.711177] CPU: 1 PID: 1795 Comm: xenconsoled Not tainted 6.1.27+0 #1
[ 2783.711189] Hardware name: Dell Inc. PowerEdge R815/0272WF, BIOS 2.8.2 05/21/2012
[ 2783.711200] RIP: e030:xen_alloc_p2m_entry+0x57d/0x930
[ 2783.711222] Code: 3d 90 41 41 01 73 5d 48 8b 05 8f 41 41 01 48 8b 04 f8 48 83 f8 ff 74 59 48 bf ff ff ff ff ff ff ff 3f 48 21 c7 e9 68 fb ff ff <0f> 0b 49 8d 7e 08 4c 89 f1 48 c7 c0 ff ff ff ff 49 c7 06 ff ff ff
[ 2783.711286] RSP: e02b:ffffc90040d37d80 EFLAGS: 00010246
[ 2783.711297] RAX: 0000000000000000 RBX: 0000000010007fff RCX: fff0000000000fff
[ 2783.711308] RDX: ffffc90040d37d98 RSI: ffffc9008003fff8 RDI: 0000007fc88de067
[ 2783.711318] RBP: ffffc90040d37e28 R08: 0000000000000000 R09: 000ffffffffff000
[ 2783.711328] R10: 0000000000000000 R11: 0000000000000000 R12: ffffc9008003fff8
[ 2783.711337] R13: 0000000000000000 R14: 0000000000000001 R15: 0000000010008000
[ 2783.711358] FS:  00007f295754f740(0000) GS:ffff888230640000(0000) knlGS:0000000000000000
[ 2783.711370] CS:  e030 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2783.711379] CR2: 00007f4c6344cff0 CR3: 0000000105e46000 CR4: 0000000000040660
[ 2783.711391] Call Trace:
[ 2783.711401]  <TASK>
[ 2783.711412]  xen_alloc_unpopulated_pages+0xa6/0x430
[ 2783.711429]  gnttab_alloc_pages+0x11/0x50
[ 2783.711441]  gntdev_alloc_map+0x1d2/0x2e0
[ 2783.711455]  gntdev_ioctl+0x261/0x540
[ 2783.711466]  __x64_sys_ioctl+0x8a/0xc0
[ 2783.711480]  do_syscall_64+0x3b/0x90
[ 2783.711494]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
[ 2783.711506] RIP: 0033:0x7f2956a875d7
[ 2783.711516] Code: 44 00 00 48 8b 05 b9 08 2d 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 89 08 2d 00 f7 d8 64 89 01 48
[ 2783.711536] RSP: 002b:00007ffcd8292e88 EFLAGS: 00000206 ORIG_RAX: 0000000000000010
[ 2783.711549] RAX: ffffffffffffffda RBX: 0000000000001000 RCX: 00007f2956a875d7
[ 2783.711559] RDX: 00007ffcd8292e90 RSI: 0000000000184700 RDI: 000000000000000c
[ 2783.711569] RBP: 00007ffcd8292f30 R08: 00007ffcd8292f5c R09: 00007ffcd8292f58
[ 2783.711579] R10: 00007ffcd82928e0 R11: 0000000000000206 R12: 00007ffcd8292e90
[ 2783.711589] R13: 0000000000000003 R14: 000000000000000c R15: 0000000000000001
[ 2783.711602]  </TASK>
[ 2783.711608] Modules linked in: arptable_filter arp_tables tcp_diag udp_diag raw_diag inet_diag netlink_diag ebtable_filter ebtables nfsv3 nfs_acl nfs lockd grace fscache netfs bnx2fc cnic uio fcoe libfcoe libfc scsi_transport_fc openvswitch nsh nf_conncount nf_nat 8021q garp mrp stp llc ipt_REJECT nf_reject_ipv4 xt_tcpudp xt_multiport xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c iptable_filter dm_multipath sunrpc dm_mod crct10dif_pclmul crc32_pclmul ghash_clmulni_intel sha512_ssse3 aesni_intel crypto_simd cryptd ipmi_si psmouse ipmi_devintf k10temp i2c_piix4 sg fam15h_power ipmi_msghandler acpi_power_meter xen_wdt ip_tables x_tables hid_generic usbhid hid sd_mod t10_pi sr_mod cdrom crc64_rocksoft crc64 ohci_pci ahci libahci ehci_pci serio_raw ixgbe ehci_hcd mdio libata ohci_hcd xfrm_algo megaraid_sas bnx2 scsi_dh_rdac scsi_dh_hp_sw scsi_dh_emc scsi_dh_alua scsi_mod scsi_common ipv6 crc_ccitt
[ 2783.711805] ---[ end trace 0000000000000000 ]---
[ 2783.716538] RIP: e030:xen_alloc_p2m_entry+0x57d/0x930
[ 2783.716553] Code: 3d 90 41 41 01 73 5d 48 8b 05 8f 41 41 01 48 8b 04 f8 48 83 f8 ff 74 59 48 bf ff ff ff ff ff ff ff 3f 48 21 c7 e9 68 fb ff ff <0f> 0b 49 8d 7e 08 4c 89 f1 48 c7 c0 ff ff ff ff 49 c7 06 ff ff ff
[ 2783.716573] RSP: e02b:ffffc90040d37d80 EFLAGS: 00010246
[ 2783.716585] RAX: 0000000000000000 RBX: 0000000010007fff RCX: fff0000000000fff
[ 2783.716596] RDX: ffffc90040d37d98 RSI: ffffc9008003fff8 RDI: 0000007fc88de067
[ 2783.716608] RBP: ffffc90040d37e28 R08: 0000000000000000 R09: 000ffffffffff000
[ 2783.716620] R10: 0000000000000000 R11: 0000000000000000 R12: ffffc9008003fff8
[ 2783.716631] R13: 0000000000000000 R14: 0000000000000001 R15: 0000000010008000
[ 2783.716648] FS:  00007f295754f740(0000) GS:ffff888230640000(0000) knlGS:0000000000000000
[ 2783.716661] CS:  e030 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2783.716671] CR2: 00007f4c6344cff0 CR3: 0000000105e46000 CR4: 0000000000040660

This was separately reported here:

https://github.com/QubesOS/qubes-issues/issues/7918#issuecomment-1331763950

I have the dmesg and /proc/iomem logs here (somewhat older kernel):

https://pastebin.com/raw/8TQUp2uG

> 
> > In some cases (e.g. the Xen scratch region), this causes future calls to
> > allocate_resource() to choose an inappropriate location which the caller
> > cannot handle. Fix by releasing the resource and removing from the
> > hierarchy.
> > 
> > Fixes: 7c3855c423b1 ("PCI: Coalesce host bridge contiguous apertures")
> 
> 7c3855c423b1 appeared in v5.16, so we may need a stable tag?

Yes, I think so.

Thanks,
Ross
Bjorn Helgaas June 9, 2023, 9:45 p.m. UTC | #4
On Thu, May 25, 2023 at 04:32:48PM +0100, Ross Lagerwall wrote:
> When contiguous windows are coalesced, the resource is invalidated and
> consequently not added to the bus. However, it remains in the resource
> hierarchy:
> 
> ...
>   ef2fff00-ef2fffff : 0000:00:13.2
>     ef2fff00-ef2fffff : ehci_hcd
> 00000000-00000000 : PCI Bus 0000:00
> f0000000-f3ffffff : PCI MMCONFIG 0000 [bus 00-3f]
>   f0000000-f3ffffff : Reserved
> ...
> 
> In some cases (e.g. the Xen scratch region), this causes future calls to
> allocate_resource() to choose an inappropriate location which the caller
> cannot handle. Fix by releasing the resource and removing from the
> hierarchy.
> 
> Fixes: 7c3855c423b1 ("PCI: Coalesce host bridge contiguous apertures")
> Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>

Applied to pci/resource for v6.5 with the commit log updated as below,
thanks!

Please let me know if I missed something, which is quite possible
because I don't know how to make the example in the commit log above
fit with the example in [1]:

  fec00000-fec7ffff : PCI Bus 0000:00
    fec00000-fec003ff : IOAPIC 0
  fec80000-fecbffff : PCI Bus 0000:00
    fec80000-fec803ff : IOAPIC 1
    fec90000-fec93fff : pnp 00:06

But maybe that's just because they're from different systems or
something.

Bjorn

[1] https://lore.kernel.org/r/DM6PR03MB53722DA6DF0E5D4E43C40753F051A@DM6PR03MB5372.namprd03.prod.outlook.com

    PCI: Release resource invalidated by coalescing
    
    When contiguous windows are coalesced by pci_register_host_bridge(), the
    second resource is expanded to include the first, and the first is
    invalidated and consequently not added to the bus. However, it remains in
    the resource hierarchy.  For example, these windows:
    
      fec00000-fec7ffff : PCI Bus 0000:00
      fec80000-fecbffff : PCI Bus 0000:00
    
    are coalesced into this, where the first resource remains in the tree with
    start/end zeroed out:
    
      00000000-00000000 : PCI Bus 0000:00
      fec00000-fecbffff : PCI Bus 0000:00
    
    In some cases (e.g. the Xen scratch region), this causes future calls to
    allocate_resource() to choose an inappropriate location which the caller
    cannot handle.
    
    Fix by releasing the zeroed-out resource and removing it from the resource
    hierarchy.
    
    [bhelgaas: commit log]
    Fixes: 7c3855c423b1 ("PCI: Coalesce host bridge contiguous apertures")
    Link: https://lore.kernel.org/r/20230525153248.712779-1-ross.lagerwall@citrix.com
    Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
    Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
    Cc: stable@vger.kernel.org	# v5.16+

> ---
>  drivers/pci/probe.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> index 0b2826c4a832..00ed20ac0dd6 100644
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -997,8 +997,10 @@ static int pci_register_host_bridge(struct pci_host_bridge *bridge)
>  	resource_list_for_each_entry_safe(window, n, &resources) {
>  		offset = window->offset;
>  		res = window->res;
> -		if (!res->flags && !res->start && !res->end)
> +		if (!res->flags && !res->start && !res->end) {
> +			release_resource(res);
>  			continue;
> +		}
>  
>  		list_move_tail(&window->node, &bridge->windows);
>  
> -- 
> 2.31.1
>
diff mbox series

Patch

diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 0b2826c4a832..00ed20ac0dd6 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -997,8 +997,10 @@  static int pci_register_host_bridge(struct pci_host_bridge *bridge)
 	resource_list_for_each_entry_safe(window, n, &resources) {
 		offset = window->offset;
 		res = window->res;
-		if (!res->flags && !res->start && !res->end)
+		if (!res->flags && !res->start && !res->end) {
+			release_resource(res);
 			continue;
+		}
 
 		list_move_tail(&window->node, &bridge->windows);