diff mbox series

[v2,1/1] PCI: Set no io resource for bridges that behind VMD controller

Message ID 1664288166-7432-1-git-send-email-lixiaochun.2888@163.com
State New
Headers show
Series [v2,1/1] PCI: Set no io resource for bridges that behind VMD controller | expand

Commit Message

Xiaochun Lee Sept. 27, 2022, 2:16 p.m. UTC
From: Xiaochun Lee <lixc17@lenovo.com>

When enable VMDs on Intel CPUs, VMD controllers(8086:28c0) be
recognized by VMD driver and there are many failed messages of
BAR 13 when scan the bridges and assign IO resource behind it
as listed below, the bridge wants to get 0x6000 as its IO
resource, but there is no IO resources on the host bridge.

VMD host bridge resources:
vmd 0000:e2:00.5: PCI host bridge to bus 10001:00
pci_bus 10001:00: root bus resource [bus 00-1f]
pci_bus 10001:00: root bus resource [mem 0xf4000000-0xf5ffffff]
pci_bus 10001:00: root bus resource [mem 0x29ffff02010-0x29fffffffff 64bit]
pci_bus 10001:00: scanning bus

Read bridge IO resource:
pci 10001:00:02.0: PCI bridge to [bus 01]
pci 10001:00:02.0:   bridge window [io  0x1000-0x6fff]
pci 10001:00:03.0: PCI bridge to [bus 02]
pci 10001:00:03.0:   bridge window [io  0x1000-0x6fff]

Failed messages of BAR#13:
pci 10001:00:02.0: BAR 13: no space for [io  size 0x6000]
pci 10001:00:02.0: BAR 13: failed to assign [io  size 0x6000]
pci 10001:00:03.0: BAR 13: no space for [io  size 0x6000]
pci 10001:00:03.0: BAR 13: failed to assign [io  size 0x6000]

VMD-enabled root ports use
Enhanced Configuration Access Mechanism (ECAM) access
PCI Express configuration space, and offer VMD_CFGBAR as
base of PCI Express configuration space for the bridges
behind it. The configuration space includes IO resources,
but these IO resources are not actually used on X86,
it can result in BAR#13 assign IO resource failed.
Therefor,clear IO resources by setting an IO base value
greater than limit to these bridges here, so in this case,
we can leverage kernel parameter "pci=hpiosize=0KB" to
avoid this failed messages show up.

Signed-off-by: Xiaochun Lee <lixc17@lenovo.com>
---
 drivers/pci/quirks.c | 60 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 60 insertions(+)

Comments

Bjorn Helgaas Sept. 27, 2022, 6:48 p.m. UTC | #1
On Tue, Sep 27, 2022 at 10:16:06PM +0800, Xiaochun Lee wrote:
> From: Xiaochun Lee <lixc17@lenovo.com>
> 
> When enable VMDs on Intel CPUs, VMD controllers(8086:28c0) be
> recognized by VMD driver and there are many failed messages of
> BAR 13 when scan the bridges and assign IO resource behind it
> as listed below, the bridge wants to get 0x6000 as its IO
> resource, but there is no IO resources on the host bridge.
> 
> VMD host bridge resources:
> vmd 0000:e2:00.5: PCI host bridge to bus 10001:00
> pci_bus 10001:00: root bus resource [bus 00-1f]
> pci_bus 10001:00: root bus resource [mem 0xf4000000-0xf5ffffff]
> pci_bus 10001:00: root bus resource [mem 0x29ffff02010-0x29fffffffff 64bit]
> pci_bus 10001:00: scanning bus
> 
> Read bridge IO resource:
> pci 10001:00:02.0: PCI bridge to [bus 01]
> pci 10001:00:02.0:   bridge window [io  0x1000-0x6fff]
> pci 10001:00:03.0: PCI bridge to [bus 02]
> pci 10001:00:03.0:   bridge window [io  0x1000-0x6fff]
> 
> Failed messages of BAR#13:
> pci 10001:00:02.0: BAR 13: no space for [io  size 0x6000]
> pci 10001:00:02.0: BAR 13: failed to assign [io  size 0x6000]
> pci 10001:00:03.0: BAR 13: no space for [io  size 0x6000]
> pci 10001:00:03.0: BAR 13: failed to assign [io  size 0x6000]
> 
> VMD-enabled root ports use
> Enhanced Configuration Access Mechanism (ECAM) access
> PCI Express configuration space, and offer VMD_CFGBAR as
> base of PCI Express configuration space for the bridges
> behind it. The configuration space includes IO resources,

I don't quite understand this part.  ECAM is an MMIO method, so it
uses memory space, not I/O port space.

> but these IO resources are not actually used on X86,
> it can result in BAR#13 assign IO resource failed.
> Therefor,clear IO resources by setting an IO base value
> greater than limit to these bridges here, so in this case,
> we can leverage kernel parameter "pci=hpiosize=0KB" to
> avoid this failed messages show up.

Is the only purpose of this patch to avoid the "no space" and "failed
to assign" messages?  Or is there something that actually doesn't
work?

I don't think it's worth making a quirk just to avoid the message.
There are several systems where I/O port space is not available, and
it *would* be nice to have a generic approach that handles that
better.

> Signed-off-by: Xiaochun Lee <lixc17@lenovo.com>
> ---
>  drivers/pci/quirks.c | 60 ++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 60 insertions(+)
> 
> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> index 4944798e75b5..efecf12e8059 100644
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -5956,3 +5956,63 @@ DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x56b1, aspm_l1_acceptable_latency
>  DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x56c0, aspm_l1_acceptable_latency);
>  DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x56c1, aspm_l1_acceptable_latency);
>  #endif
> +
> +#if defined(CONFIG_X86_64) || defined(CONFIG_X86)
> +
> +#ifdef CONFIG_UML_X86
> +#define is_vmd(bus)             false
> +#endif /* CONFIG_UML_X86 */
> +
> +/*
> + * VMD-enabled root ports use Enhanced Configuration Access Mechanism (ECAM)
> + * access PCI Express configuration space, and offer VMD_CFGBAR as the
> + * base of PCI Express configuration space for the bridges behind it.
> + * The configuration space includes IO resources, but these IO
> + * resources are not actually used on X86, and it can result
> + * in BAR#13 assign IO resource failed. Therefor, clear IO resources
> + * by setting an IO base value greater than limit to these bridges here,
> + * so in this case, append kernel parameter "pci=hpiosize=0KB" can avoid
> + * the BAR#13 failed messages show up.
> + */
> +static void quirk_vmd_no_iosize(struct pci_dev *bridge)
> +{
> +	u8 io_base_lo, io_limit_lo;
> +	u16 io_low;
> +	u32 io_upper16;
> +	unsigned long io_mask,  base, limit;
> +
> +	io_mask = PCI_IO_RANGE_MASK;
> +	if (bridge->io_window_1k)
> +		io_mask = PCI_IO_1K_RANGE_MASK;
> +
> +	/* VMD Domain */
> +	if (is_vmd(bridge->bus) && bridge->is_hotplug_bridge) {
> +		pci_read_config_byte(bridge, PCI_IO_BASE, &io_base_lo);
> +		pci_read_config_byte(bridge, PCI_IO_LIMIT, &io_limit_lo);
> +		base = (io_base_lo & io_mask) << 8;
> +		limit = (io_limit_lo & io_mask) << 8;
> +		/* if there are defined io ports behind the bridge on x86,
> +		 * we clear it, since there is only 64KB IO resource on it,
> +		 * beyond that, hotplug io bridges don't needs IO port resource,
> +		 * such as NVMes attach on it. So the corresponding range must be
> +		 * turned off by writing base value greater than limit to the
> +		 * bridge's base/limit registers.
> +		 */
> +		if (limit >= base) {
> +			/* Clear upper 16 bits of I/O base/limit */
> +			io_upper16 = 0;
> +			/* set base value greater than limit */
> +			io_low = 0x00f0;
> +
> +			/* Temporarily disable the I/O range before updating PCI_IO_BASE */
> +			pci_write_config_dword(bridge, PCI_IO_BASE_UPPER16, 0x0000ffff);
> +			/* Update lower 16 bits of I/O base/limit */
> +			pci_write_config_word(bridge, PCI_IO_BASE, io_low);
> +			/* Update upper 16 bits of I/O base/limit */
> +			pci_write_config_dword(bridge, PCI_IO_BASE_UPPER16, io_upper16);
> +		}
> +	}
> +}
> +DECLARE_PCI_FIXUP_CLASS_HEADER(PCI_ANY_ID, PCI_ANY_ID,
> +		PCI_CLASS_BRIDGE_PCI, 8, quirk_vmd_no_iosize);
> +#endif
> -- 
> 2.37.3
>
Xiaochun Li Sept. 28, 2022, 6:49 a.m. UTC | #2
On Wed, Sep 28, 2022 at 2:57 AM Bjorn Helgaas <helgaas@kernel.org> wrote:
>
> On Tue, Sep 27, 2022 at 10:16:06PM +0800, Xiaochun Lee wrote:
> > From: Xiaochun Lee <lixc17@lenovo.com>
> >
> > When enable VMDs on Intel CPUs, VMD controllers(8086:28c0) be
> > recognized by VMD driver and there are many failed messages of
> > BAR 13 when scan the bridges and assign IO resource behind it
> > as listed below, the bridge wants to get 0x6000 as its IO
> > resource, but there is no IO resources on the host bridge.
> >
> > VMD host bridge resources:
> > vmd 0000:e2:00.5: PCI host bridge to bus 10001:00
> > pci_bus 10001:00: root bus resource [bus 00-1f]
> > pci_bus 10001:00: root bus resource [mem 0xf4000000-0xf5ffffff]
> > pci_bus 10001:00: root bus resource [mem 0x29ffff02010-0x29fffffffff 64bit]
> > pci_bus 10001:00: scanning bus
> >
> > Read bridge IO resource:
> > pci 10001:00:02.0: PCI bridge to [bus 01]
> > pci 10001:00:02.0:   bridge window [io  0x1000-0x6fff]
> > pci 10001:00:03.0: PCI bridge to [bus 02]
> > pci 10001:00:03.0:   bridge window [io  0x1000-0x6fff]
> >
> > Failed messages of BAR#13:
> > pci 10001:00:02.0: BAR 13: no space for [io  size 0x6000]
> > pci 10001:00:02.0: BAR 13: failed to assign [io  size 0x6000]
> > pci 10001:00:03.0: BAR 13: no space for [io  size 0x6000]
> > pci 10001:00:03.0: BAR 13: failed to assign [io  size 0x6000]
> >
> > VMD-enabled root ports use
> > Enhanced Configuration Access Mechanism (ECAM) access
> > PCI Express configuration space, and offer VMD_CFGBAR as
> > base of PCI Express configuration space for the bridges
> > behind it. The configuration space includes IO resources,
>
> I don't quite understand this part.  ECAM is an MMIO method, so it
> uses memory space, not I/O port space.

VMD read PCI IO base has a special callback function "vmd_pci_read",
The call stack list as below:

pci_read_bridge_windows()
        pci_read_config_word(bridge, PCI_IO_BASE, &io) # read BAR 13
                pci_bus_read_config_word()
                        bus->ops->read()
                                vmd_pci_read() # for VMD bus
                                        vmd_cfg_addr()
                                                offset =
PCIE_ECAM_OFFSET(busnr_ecam, devfn, reg);
                                                return vmd->cfgbar +
offset; #base vmd->cfgbar
                                        readw(addr) # read value out

And vmd->cfgbar points to  an iomap for VMD_CFGBAR as listed below.
vmd->cfgbar = pcim_iomap(dev, VMD_CFGBAR, 0);

Intel spec Sapphire Rapids Server Processor,EDS volume two,Revision 0.9
Chapter 9 Volume Management Device (VMD) Registers
Section 9.1.9 VMD Configuration Base Address show CFGBAR is offset 0x10.

dmesg list as following:
[    2.091800] pci 0000:64:00.5: [8086:28c0] type 00 class 0x010400
[    2.091810] pci 0000:64:00.5: reg 0x10: [mem
0x24ffc000000-0x24ffdffffff 64bit pref]
[    2.091816] pci 0000:64:00.5: reg 0x18: [mem 0xe0000000-0xe1ffffff]
[    2.091827] pci 0000:64:00.5: reg 0x20: [mem
0x24ffff00000-0x24fffffffff 64bit]

Commad "lspci -vvv -s 64:00.5" shown as below:
0000:64:00.5 RAID bus controller: Intel Corporation Volume Management
Device NVMe RAID Controller (rev 04)
        DeviceName: Intel RAID Controller CPU1
        Subsystem: Lenovo Device 010f
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 64 bytes
        NUMA node: 0
        Region 0: Memory at 24ffc000000 (64-bit, prefetchable) [size=32M]
        Region 2: Memory at e0000000 (32-bit, non-prefetchable) [size=32M]
        Region 4: Memory at 24ffff00000 (64-bit, non-prefetchable) [size=1M]

So for VMD, actually the bridges configuration IO base and IO limit of
BAR#13 are read from VMD host bridge BAR0(offset 0x10), not like the normal
bridges on x86 read Configuration via IO port(0cf8-0cff : PCI conf1)

And refer spec PCI Express® Base Specification Revision 5.0 Version 1.0
Section 7.2 PCI Express Configuration Mechanisms
subsection 7.2.2 PCI Express Enhanced Configuration Access Mechanism (ECAM)
has a description as below.
The ECAM utilizes a flat memory-mapped address space to access device
configuration registers. In this case, the memory address determines the
configuration register accessed and the memory data updates (for a write)
or returns the contents of (for a read) the addressed register.

>
> > but these IO resources are not actually used on X86,
> > it can result in BAR#13 assign IO resource failed.
> > Therefor,clear IO resources by setting an IO base value
> > greater than limit to these bridges here, so in this case,
> > we can leverage kernel parameter "pci=hpiosize=0KB" to
> > avoid this failed messages show up.
>
> Is the only purpose of this patch to avoid the "no space" and "failed
> to assign" messages?  Or is there something that actually doesn't
> work?

Yes, only to avoid the failed messages showing up, we don't find any function
impact so far. Our customers extremely care about these failed messages,
They keep complaining about it. For that non-vmd hot plug bridges,
append kernel parameter "pci=hpiosize=0KB" can avoid them show up, but for
bridges behind VMD, it couldn't work, there isn't any workaround so far.

>
> I don't think it's worth making a quirk just to avoid the message.
> There are several systems where I/O port space is not available, and
> it *would* be nice to have a generic approach that handles that
> better.

The bridge IO base and limit of BAR#13 read from their host bridge
BAR0(offset 0x10),it might be an uncorrected value set by the VROC driver in
UEFI.Hence, it seems the firmware could fix it. But it is really hard to push
them to enhance it, so I prepare a quirk for the corresponding bridges here.
If making a generic approach to avoid the failed message for assigning IO
resource of BAR#13, where do you prefer adding it?
Do you have any recommendations? Thanks!
And would you please share with me which systems have an unavailable
IO port space too, appreciate it.

>
> > Signed-off-by: Xiaochun Lee <lixc17@lenovo.com>
> > ---
> >  drivers/pci/quirks.c | 60 ++++++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 60 insertions(+)
> >
> > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> > index 4944798e75b5..efecf12e8059 100644
> > --- a/drivers/pci/quirks.c
> > +++ b/drivers/pci/quirks.c
> > @@ -5956,3 +5956,63 @@ DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x56b1, aspm_l1_acceptable_latency
> >  DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x56c0, aspm_l1_acceptable_latency);
> >  DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x56c1, aspm_l1_acceptable_latency);
> >  #endif
> > +
> > +#if defined(CONFIG_X86_64) || defined(CONFIG_X86)
> > +
> > +#ifdef CONFIG_UML_X86
> > +#define is_vmd(bus)             false
> > +#endif /* CONFIG_UML_X86 */
> > +
> > +/*
> > + * VMD-enabled root ports use Enhanced Configuration Access Mechanism (ECAM)
> > + * access PCI Express configuration space, and offer VMD_CFGBAR as the
> > + * base of PCI Express configuration space for the bridges behind it.
> > + * The configuration space includes IO resources, but these IO
> > + * resources are not actually used on X86, and it can result
> > + * in BAR#13 assign IO resource failed. Therefor, clear IO resources
> > + * by setting an IO base value greater than limit to these bridges here,
> > + * so in this case, append kernel parameter "pci=hpiosize=0KB" can avoid
> > + * the BAR#13 failed messages show up.
> > + */
> > +static void quirk_vmd_no_iosize(struct pci_dev *bridge)
> > +{
> > +     u8 io_base_lo, io_limit_lo;
> > +     u16 io_low;
> > +     u32 io_upper16;
> > +     unsigned long io_mask,  base, limit;
> > +
> > +     io_mask = PCI_IO_RANGE_MASK;
> > +     if (bridge->io_window_1k)
> > +             io_mask = PCI_IO_1K_RANGE_MASK;
> > +
> > +     /* VMD Domain */
> > +     if (is_vmd(bridge->bus) && bridge->is_hotplug_bridge) {
> > +             pci_read_config_byte(bridge, PCI_IO_BASE, &io_base_lo);
> > +             pci_read_config_byte(bridge, PCI_IO_LIMIT, &io_limit_lo);
> > +             base = (io_base_lo & io_mask) << 8;
> > +             limit = (io_limit_lo & io_mask) << 8;
> > +             /* if there are defined io ports behind the bridge on x86,
> > +              * we clear it, since there is only 64KB IO resource on it,
> > +              * beyond that, hotplug io bridges don't needs IO port resource,
> > +              * such as NVMes attach on it. So the corresponding range must be
> > +              * turned off by writing base value greater than limit to the
> > +              * bridge's base/limit registers.
> > +              */
> > +             if (limit >= base) {
> > +                     /* Clear upper 16 bits of I/O base/limit */
> > +                     io_upper16 = 0;
> > +                     /* set base value greater than limit */
> > +                     io_low = 0x00f0;
> > +
> > +                     /* Temporarily disable the I/O range before updating PCI_IO_BASE */
> > +                     pci_write_config_dword(bridge, PCI_IO_BASE_UPPER16, 0x0000ffff);
> > +                     /* Update lower 16 bits of I/O base/limit */
> > +                     pci_write_config_word(bridge, PCI_IO_BASE, io_low);
> > +                     /* Update upper 16 bits of I/O base/limit */
> > +                     pci_write_config_dword(bridge, PCI_IO_BASE_UPPER16, io_upper16);
> > +             }
> > +     }
> > +}
> > +DECLARE_PCI_FIXUP_CLASS_HEADER(PCI_ANY_ID, PCI_ANY_ID,
> > +             PCI_CLASS_BRIDGE_PCI, 8, quirk_vmd_no_iosize);
> > +#endif
> > --
> > 2.37.3
> >
>
diff mbox series

Patch

diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 4944798e75b5..efecf12e8059 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -5956,3 +5956,63 @@  DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x56b1, aspm_l1_acceptable_latency
 DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x56c0, aspm_l1_acceptable_latency);
 DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x56c1, aspm_l1_acceptable_latency);
 #endif
+
+#if defined(CONFIG_X86_64) || defined(CONFIG_X86)
+
+#ifdef CONFIG_UML_X86
+#define is_vmd(bus)             false
+#endif /* CONFIG_UML_X86 */
+
+/*
+ * VMD-enabled root ports use Enhanced Configuration Access Mechanism (ECAM)
+ * access PCI Express configuration space, and offer VMD_CFGBAR as the
+ * base of PCI Express configuration space for the bridges behind it.
+ * The configuration space includes IO resources, but these IO
+ * resources are not actually used on X86, and it can result
+ * in BAR#13 assign IO resource failed. Therefor, clear IO resources
+ * by setting an IO base value greater than limit to these bridges here,
+ * so in this case, append kernel parameter "pci=hpiosize=0KB" can avoid
+ * the BAR#13 failed messages show up.
+ */
+static void quirk_vmd_no_iosize(struct pci_dev *bridge)
+{
+	u8 io_base_lo, io_limit_lo;
+	u16 io_low;
+	u32 io_upper16;
+	unsigned long io_mask,  base, limit;
+
+	io_mask = PCI_IO_RANGE_MASK;
+	if (bridge->io_window_1k)
+		io_mask = PCI_IO_1K_RANGE_MASK;
+
+	/* VMD Domain */
+	if (is_vmd(bridge->bus) && bridge->is_hotplug_bridge) {
+		pci_read_config_byte(bridge, PCI_IO_BASE, &io_base_lo);
+		pci_read_config_byte(bridge, PCI_IO_LIMIT, &io_limit_lo);
+		base = (io_base_lo & io_mask) << 8;
+		limit = (io_limit_lo & io_mask) << 8;
+		/* if there are defined io ports behind the bridge on x86,
+		 * we clear it, since there is only 64KB IO resource on it,
+		 * beyond that, hotplug io bridges don't needs IO port resource,
+		 * such as NVMes attach on it. So the corresponding range must be
+		 * turned off by writing base value greater than limit to the
+		 * bridge's base/limit registers.
+		 */
+		if (limit >= base) {
+			/* Clear upper 16 bits of I/O base/limit */
+			io_upper16 = 0;
+			/* set base value greater than limit */
+			io_low = 0x00f0;
+
+			/* Temporarily disable the I/O range before updating PCI_IO_BASE */
+			pci_write_config_dword(bridge, PCI_IO_BASE_UPPER16, 0x0000ffff);
+			/* Update lower 16 bits of I/O base/limit */
+			pci_write_config_word(bridge, PCI_IO_BASE, io_low);
+			/* Update upper 16 bits of I/O base/limit */
+			pci_write_config_dword(bridge, PCI_IO_BASE_UPPER16, io_upper16);
+		}
+	}
+}
+DECLARE_PCI_FIXUP_CLASS_HEADER(PCI_ANY_ID, PCI_ANY_ID,
+		PCI_CLASS_BRIDGE_PCI, 8, quirk_vmd_no_iosize);
+#endif