diff mbox series

[v2] x86/PCI: Mark Power Control Unit as having non-compliant BARs

Message ID 1589537271-46459-1-git-send-email-lixiaochun.2888@163.com
State New
Headers show
Series [v2] x86/PCI: Mark Power Control Unit as having non-compliant BARs | expand

Commit Message

Xiaochun Lee May 15, 2020, 10:07 a.m. UTC
From: Xiaochun Lee <lixc17@lenovo.com>

The device [8086:a26c] is a Power Control Unit of
Intel Ice Lake Server Processor and devices [8086:a1ec,a1ed]
are the Power Control Unit of Intel Xeon Scalable Processor,
kernel treats their pci BARs as a base address register that
leading to a boot failure like:
"pci 0000:00:11.0: [Firmware Bug]: reg 0x30: invalid BAR (can't size)".

The symptoms in Ice Lake processor is:
"QU99 ICE LAKE ES1 HCC 24C 185W 3200 L-0"

The information of the device [8086:a26c] list as below:
00:11.0 Unassigned class [ff00]: Intel Corporation Device a26c (rev 03)
        Subsystem: Lenovo Device 7811
        Flags: fast devsel, NUMA node 0
        Expansion ROM at <ignored> [disabled]
        Capabilities: [80] Power Management version 3

The symptoms in Xeon Scalable Processor is:
"Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz"
"Intel(R) Xeon(R) Gold 6252 CPU @ 2.00GHz"

The information of the Device [8086:a1ec] list as below:
00:11.0 Unassigned class [ff00]: Intel Corporation C620 Series Chipset Family MROM 0 [8086:a1ec] (rev 09)
        Subsystem: Lenovo Device [17aa:7805]
        Latency: 0, Cache Line Size: 64 bytes
        NUMA node: 0
        Expansion ROM at <ignored> [disabled]
        Capabilities: [80] Power Management version 3

There are no other BARs on this devices, so mark the PCU as having
non-compliant BARs, therefore we don't try to probe any of them.

Signed-off-by: Xiaochun Lee <lixc17@lenovo.com>
---
 arch/x86/pci/fixup.c | 6 ++++++
 1 file changed, 6 insertions(+)

Comments

Bjorn Helgaas May 15, 2020, 7:23 p.m. UTC | #1
On Fri, May 15, 2020 at 06:07:51AM -0400, Xiaochun Lee wrote:
> From: Xiaochun Lee <lixc17@lenovo.com>
> 
> The device [8086:a26c] is a Power Control Unit of
> Intel Ice Lake Server Processor and devices [8086:a1ec,a1ed]
> are the Power Control Unit of Intel Xeon Scalable Processor,
> kernel treats their pci BARs as a base address register that
> leading to a boot failure like:
> "pci 0000:00:11.0: [Firmware Bug]: reg 0x30: invalid BAR (can't size)".

Do you have a spec that says these are Power Control Units?  The spec
I found for the C620 PCH claims these are all "MROM" devices related
to "Enterprise Value Add", "Intel Management Engine", and "Innovation
Engine" configuration.

I updated the commit log, added [8086:a26d] as mentioned in that spec,
added a stable tag, and applied the patch below to pci/misc for v5.8.
Let me know if that doesn't look right.

> The symptoms in Ice Lake processor is:
> "QU99 ICE LAKE ES1 HCC 24C 185W 3200 L-0"
> 
> The information of the device [8086:a26c] list as below:
> 00:11.0 Unassigned class [ff00]: Intel Corporation Device a26c (rev 03)
>         Subsystem: Lenovo Device 7811
>         Flags: fast devsel, NUMA node 0
>         Expansion ROM at <ignored> [disabled]
>         Capabilities: [80] Power Management version 3
> 
> The symptoms in Xeon Scalable Processor is:
> "Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz"
> "Intel(R) Xeon(R) Gold 6252 CPU @ 2.00GHz"
> 
> The information of the Device [8086:a1ec] list as below:
> 00:11.0 Unassigned class [ff00]: Intel Corporation C620 Series Chipset Family MROM 0 [8086:a1ec] (rev 09)
>         Subsystem: Lenovo Device [17aa:7805]
>         Latency: 0, Cache Line Size: 64 bytes
>         NUMA node: 0
>         Expansion ROM at <ignored> [disabled]
>         Capabilities: [80] Power Management version 3
> 
> There are no other BARs on this devices, so mark the PCU as having
> non-compliant BARs, therefore we don't try to probe any of them.
> 
> Signed-off-by: Xiaochun Lee <lixc17@lenovo.com>
> ---
>  arch/x86/pci/fixup.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/arch/x86/pci/fixup.c b/arch/x86/pci/fixup.c
> index e723559..d9abc67 100644
> --- a/arch/x86/pci/fixup.c
> +++ b/arch/x86/pci/fixup.c
> @@ -563,6 +563,9 @@ static void twinhead_reserve_killing_zone(struct pci_dev *dev)
>   * Erratum BDF2
>   * PCI BARs in the Home Agent Will Return Non-Zero Values During Enumeration
>   * http://www.intel.com/content/www/us/en/processors/xeon/xeon-e5-v4-spec-update.html
> + *
> + * Device [8086:a26c]
> + * Devices [8086:a1ec,a1ed]
>   */
>  static void pci_invalid_bar(struct pci_dev *dev)
>  {
> @@ -572,6 +575,9 @@ static void pci_invalid_bar(struct pci_dev *dev)
>  DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x6f60, pci_invalid_bar);
>  DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x6fa0, pci_invalid_bar);
>  DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x6fc0, pci_invalid_bar);
> +DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0xa1ec, pci_invalid_bar);
> +DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0xa1ed, pci_invalid_bar);
> +DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0xa26c, pci_invalid_bar);
>  
>  /*
>   * Device [1022:7808]
> -- 
> 1.8.3.1

commit 1574051e52cb ("x86/PCI: Mark Intel C620 MROMs as having non-compliant BARs")
Author: Xiaochun Lee <lixc17@lenovo.com>
Date:   Thu May 14 23:31:07 2020 -0400

    x86/PCI: Mark Intel C620 MROMs as having non-compliant BARs
    
    The Intel C620 Platform Controller Hub has MROM functions that have non-PCI
    registers (undocumented in the public spec) where BAR 0 is supposed to be,
    which results in messages like this:
    
      pci 0000:00:11.0: [Firmware Bug]: reg 0x30: invalid BAR (can't size)
    
    Mark these MROM functions as having non-compliant BARs so we don't try to
    probe any of them.  There are no other BARs on these devices.
    
    See the Intel C620 Series Chipset Platform Controller Hub Datasheet,
    May 2019, Document Number 336067-007US, sec 2.1, 35.5, 35.6.
    
    [bhelgaas: commit log, add 0xa26d]
    Link: https://lore.kernel.org/r/1589513467-17070-1-git-send-email-lixiaochun.2888@163.com
    Signed-off-by: Xiaochun Lee <lixc17@lenovo.com>
    Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
    Cc: stable@vger.kernel.org

diff --git a/arch/x86/pci/fixup.c b/arch/x86/pci/fixup.c
index e723559c386a..0c67a5a94de3 100644
--- a/arch/x86/pci/fixup.c
+++ b/arch/x86/pci/fixup.c
@@ -572,6 +572,10 @@ DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x2fc0, pci_invalid_bar);
 DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x6f60, pci_invalid_bar);
 DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x6fa0, pci_invalid_bar);
 DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x6fc0, pci_invalid_bar);
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0xa1ec, pci_invalid_bar);
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0xa1ed, pci_invalid_bar);
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0xa26c, pci_invalid_bar);
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0xa26d, pci_invalid_bar);
 
 /*
  * Device [1022:7808]
diff mbox series

Patch

diff --git a/arch/x86/pci/fixup.c b/arch/x86/pci/fixup.c
index e723559..d9abc67 100644
--- a/arch/x86/pci/fixup.c
+++ b/arch/x86/pci/fixup.c
@@ -563,6 +563,9 @@  static void twinhead_reserve_killing_zone(struct pci_dev *dev)
  * Erratum BDF2
  * PCI BARs in the Home Agent Will Return Non-Zero Values During Enumeration
  * http://www.intel.com/content/www/us/en/processors/xeon/xeon-e5-v4-spec-update.html
+ *
+ * Device [8086:a26c]
+ * Devices [8086:a1ec,a1ed]
  */
 static void pci_invalid_bar(struct pci_dev *dev)
 {
@@ -572,6 +575,9 @@  static void pci_invalid_bar(struct pci_dev *dev)
 DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x6f60, pci_invalid_bar);
 DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x6fa0, pci_invalid_bar);
 DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x6fc0, pci_invalid_bar);
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0xa1ec, pci_invalid_bar);
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0xa1ed, pci_invalid_bar);
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0xa26c, pci_invalid_bar);
 
 /*
  * Device [1022:7808]