diff mbox series

[v2,1/4] efi/x86: Remove EfiMemoryMappedIO from E820 map

Message ID 20221208190341.1560157-2-helgaas@kernel.org
State New
Headers show
Series PCI: Continue E820 vs host bridge window saga | expand

Commit Message

Bjorn Helgaas Dec. 8, 2022, 7:03 p.m. UTC
From: Bjorn Helgaas <bhelgaas@google.com>

Firmware can use EfiMemoryMappedIO to request that MMIO regions be mapped
by the OS so they can be accessed by EFI runtime services, but should have
no other significance to the OS (UEFI r2.10, sec 7.2).  However, most
bootloaders and EFI stubs convert EfiMemoryMappedIO regions to
E820_TYPE_RESERVED entries, which prevent Linux from allocating space from
them (see remove_e820_regions()).

Some platforms use EfiMemoryMappedIO entries for PCI MMCONFIG space and PCI
host bridge windows, which means Linux can't allocate BAR space for
hot-added devices.

Remove large EfiMemoryMappedIO regions from the E820 map to avoid this
problem.

Leave small (< 256KB) EfiMemoryMappedIO regions alone because on some
platforms, these describe non-window space that's included in host bridge
_CRS.  If we assign that space to PCI devices, they don't work.  On the
Lenovo X1 Carbon, this leads to suspend/resume failures.

The previous solution to the problem of allocating BARs in these regions
was to add pci_crs_quirks[] entries to disable E820 checking for these
machines (see d341838d776a ("x86/PCI: Disable E820 reserved region clipping
via quirks")):

  Acer   DMI_PRODUCT_NAME    Spin SP513-54N
  Clevo  DMI_BOARD_NAME      X170KM-G
  Lenovo DMI_PRODUCT_VERSION *IIL*

Florent reported the BAR allocation issue on the Clevo NL4XLU.  We could
add another quirk for the NL4XLU, but I hope this generic change can solve
it for many machines without having to add quirks.

This change has been tested on Clevo X170KM-G (Konrad) and Lenovo Ideapad
Slim 3 (Matt) and solves the problem even when overriding the existing
quirks by booting with "pci=use_e820".

Link: https://bugzilla.kernel.org/show_bug.cgi?id=216565     Clevo NL4XLU
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206459#c78 Clevo X170KM-G
Link: https://bugzilla.redhat.com/show_bug.cgi?id=1868899    Ideapad Slim 3
Link: https://bugzilla.redhat.com/show_bug.cgi?id=2029207    X1 Carbon
Reported-by: Florent DELAHAYE <kernelorg@undead.fr>
Tested-by: Konrad J Hambrick <kjhambrick@gmail.com>
Tested-by: Matt Hansen <2lprbe78@duck.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Hans de Goede <hdegoede@redhat.com>
---
 arch/x86/platform/efi/efi.c | 46 +++++++++++++++++++++++++++++++++++++
 1 file changed, 46 insertions(+)

Comments

Hans de Goede Dec. 9, 2022, 8:06 a.m. UTC | #1
Hi,

One comment (logging bug in patch) below:

On 12/8/22 20:03, Bjorn Helgaas wrote:
> From: Bjorn Helgaas <bhelgaas@google.com>
> 
> Firmware can use EfiMemoryMappedIO to request that MMIO regions be mapped
> by the OS so they can be accessed by EFI runtime services, but should have
> no other significance to the OS (UEFI r2.10, sec 7.2).  However, most
> bootloaders and EFI stubs convert EfiMemoryMappedIO regions to
> E820_TYPE_RESERVED entries, which prevent Linux from allocating space from
> them (see remove_e820_regions()).
> 
> Some platforms use EfiMemoryMappedIO entries for PCI MMCONFIG space and PCI
> host bridge windows, which means Linux can't allocate BAR space for
> hot-added devices.
> 
> Remove large EfiMemoryMappedIO regions from the E820 map to avoid this
> problem.
> 
> Leave small (< 256KB) EfiMemoryMappedIO regions alone because on some
> platforms, these describe non-window space that's included in host bridge
> _CRS.  If we assign that space to PCI devices, they don't work.  On the
> Lenovo X1 Carbon, this leads to suspend/resume failures.
> 
> The previous solution to the problem of allocating BARs in these regions
> was to add pci_crs_quirks[] entries to disable E820 checking for these
> machines (see d341838d776a ("x86/PCI: Disable E820 reserved region clipping
> via quirks")):
> 
>   Acer   DMI_PRODUCT_NAME    Spin SP513-54N
>   Clevo  DMI_BOARD_NAME      X170KM-G
>   Lenovo DMI_PRODUCT_VERSION *IIL*
> 
> Florent reported the BAR allocation issue on the Clevo NL4XLU.  We could
> add another quirk for the NL4XLU, but I hope this generic change can solve
> it for many machines without having to add quirks.
> 
> This change has been tested on Clevo X170KM-G (Konrad) and Lenovo Ideapad
> Slim 3 (Matt) and solves the problem even when overriding the existing
> quirks by booting with "pci=use_e820".
> 
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=216565     Clevo NL4XLU
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=206459#c78 Clevo X170KM-G
> Link: https://bugzilla.redhat.com/show_bug.cgi?id=1868899    Ideapad Slim 3
> Link: https://bugzilla.redhat.com/show_bug.cgi?id=2029207    X1 Carbon
> Reported-by: Florent DELAHAYE <kernelorg@undead.fr>
> Tested-by: Konrad J Hambrick <kjhambrick@gmail.com>
> Tested-by: Matt Hansen <2lprbe78@duck.com>
> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
> Cc: Hans de Goede <hdegoede@redhat.com>
> ---
>  arch/x86/platform/efi/efi.c | 46 +++++++++++++++++++++++++++++++++++++
>  1 file changed, 46 insertions(+)
> 
> diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
> index ebc98a68c400..dee1852e95cd 100644
> --- a/arch/x86/platform/efi/efi.c
> +++ b/arch/x86/platform/efi/efi.c
> @@ -303,6 +303,50 @@ static void __init efi_clean_memmap(void)
>  	}
>  }
>  
> +/*
> + * Firmware can use EfiMemoryMappedIO to request that MMIO regions be
> + * mapped by the OS so they can be accessed by EFI runtime services, but
> + * should have no other significance to the OS (UEFI r2.10, sec 7.2).
> + * However, most bootloaders and EFI stubs convert EfiMemoryMappedIO
> + * regions to E820_TYPE_RESERVED entries, which prevent Linux from
> + * allocating space from them (see remove_e820_regions()).
> + *
> + * Some platforms use EfiMemoryMappedIO entries for PCI MMCONFIG space and
> + * PCI host bridge windows, which means Linux can't allocate BAR space for
> + * hot-added devices.
> + *
> + * Remove large EfiMemoryMappedIO regions from the E820 map to avoid this
> + * problem.
> + *
> + * Retain small EfiMemoryMappedIO regions because on some platforms, these
> + * describe non-window space that's included in host bridge _CRS.  If we
> + * assign that space to PCI devices, they don't work.
> + */
> +static void __init efi_remove_e820_mmio(void)
> +{
> +	efi_memory_desc_t *md;
> +	u64 size, start, end;
> +	int i = 0;
> +
> +	for_each_efi_memory_desc(md) {
> +		if (md->type == EFI_MEMORY_MAPPED_IO) {
> +			size = md->num_pages << EFI_PAGE_SHIFT;
> +			if (size >= 256*1024) {
> +				start = md->phys_addr;
> +				end = start + size - 1;
> +				pr_info("Remove mem%02u: MMIO range=[0x%08llx-0x%08llx] (%lluMB) from e820 map\n",
> +					i, start, end, size >> 20);
> +				e820__range_remove(start, size,
> +						   E820_TYPE_RESERVED, 1);
> +			} else {
> +				pr_info("Not removing mem%02u: MMIO range=[0x%08llx-0x%08llx] (%lluKB) from e820 map\n",
> +					i, start, end, size >> 10);

The logging in this else is re-using the start and end from the previous section which was actually removed.

E.g. Matt's latest log from:
https://bugzilla.redhat.com/show_bug.cgi?id=1868899
has:

[    0.000000] e820: remove [mem 0xfc800000-0xfe7fffff] reserved
[    0.000000] efi: Not removing mem46: MMIO range=[0xfc800000-0xfe7fffff] (4KB) from e820 map
[    0.000000] efi: Not removing mem47: MMIO range=[0xfc800000-0xfe7fffff] (32KB) from e820 map
[    0.000000] efi: Not removing mem49: MMIO range=[0xfc800000-0xfe7fffff] (8KB) from e820 map
[    0.000000] efi: Not removing mem50: MMIO range=[0xfc800000-0xfe7fffff] (4KB) from e820 map

Notice how all the "Not removing ..." lines log the same range as
the actually removed map entry above them.

Regards,

Hans








> +			}
> +		}
> +		i++;
> +	}
> +}
> +
>  void __init efi_print_memmap(void)
>  {
>  	efi_memory_desc_t *md;
> @@ -474,6 +518,8 @@ void __init efi_init(void)
>  	set_bit(EFI_RUNTIME_SERVICES, &efi.flags);
>  	efi_clean_memmap();
>  
> +	efi_remove_e820_mmio();
> +
>  	if (efi_enabled(EFI_DBG))
>  		efi_print_memmap();
>  }
Hans de Goede Dec. 9, 2022, 11:04 a.m. UTC | #2
Hi,

On 12/9/22 09:06, Hans de Goede wrote:
> Hi,
> 
> One comment (logging bug in patch) below:
> 
> On 12/8/22 20:03, Bjorn Helgaas wrote:
>> From: Bjorn Helgaas <bhelgaas@google.com>
>>
>> Firmware can use EfiMemoryMappedIO to request that MMIO regions be mapped
>> by the OS so they can be accessed by EFI runtime services, but should have
>> no other significance to the OS (UEFI r2.10, sec 7.2).  However, most
>> bootloaders and EFI stubs convert EfiMemoryMappedIO regions to
>> E820_TYPE_RESERVED entries, which prevent Linux from allocating space from
>> them (see remove_e820_regions()).
>>
>> Some platforms use EfiMemoryMappedIO entries for PCI MMCONFIG space and PCI
>> host bridge windows, which means Linux can't allocate BAR space for
>> hot-added devices.
>>
>> Remove large EfiMemoryMappedIO regions from the E820 map to avoid this
>> problem.
>>
>> Leave small (< 256KB) EfiMemoryMappedIO regions alone because on some
>> platforms, these describe non-window space that's included in host bridge
>> _CRS.  If we assign that space to PCI devices, they don't work.  On the
>> Lenovo X1 Carbon, this leads to suspend/resume failures.
>>
>> The previous solution to the problem of allocating BARs in these regions
>> was to add pci_crs_quirks[] entries to disable E820 checking for these
>> machines (see d341838d776a ("x86/PCI: Disable E820 reserved region clipping
>> via quirks")):
>>
>>   Acer   DMI_PRODUCT_NAME    Spin SP513-54N
>>   Clevo  DMI_BOARD_NAME      X170KM-G
>>   Lenovo DMI_PRODUCT_VERSION *IIL*
>>
>> Florent reported the BAR allocation issue on the Clevo NL4XLU.  We could
>> add another quirk for the NL4XLU, but I hope this generic change can solve
>> it for many machines without having to add quirks.
>>
>> This change has been tested on Clevo X170KM-G (Konrad) and Lenovo Ideapad
>> Slim 3 (Matt) and solves the problem even when overriding the existing
>> quirks by booting with "pci=use_e820".
>>
>> Link: https://bugzilla.kernel.org/show_bug.cgi?id=216565     Clevo NL4XLU
>> Link: https://bugzilla.kernel.org/show_bug.cgi?id=206459#c78 Clevo X170KM-G
>> Link: https://bugzilla.redhat.com/show_bug.cgi?id=1868899    Ideapad Slim 3
>> Link: https://bugzilla.redhat.com/show_bug.cgi?id=2029207    X1 Carbon
>> Reported-by: Florent DELAHAYE <kernelorg@undead.fr>
>> Tested-by: Konrad J Hambrick <kjhambrick@gmail.com>
>> Tested-by: Matt Hansen <2lprbe78@duck.com>
>> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
>> Cc: Hans de Goede <hdegoede@redhat.com>
>> ---
>>  arch/x86/platform/efi/efi.c | 46 +++++++++++++++++++++++++++++++++++++
>>  1 file changed, 46 insertions(+)
>>
>> diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
>> index ebc98a68c400..dee1852e95cd 100644
>> --- a/arch/x86/platform/efi/efi.c
>> +++ b/arch/x86/platform/efi/efi.c
>> @@ -303,6 +303,50 @@ static void __init efi_clean_memmap(void)
>>  	}
>>  }
>>  
>> +/*
>> + * Firmware can use EfiMemoryMappedIO to request that MMIO regions be
>> + * mapped by the OS so they can be accessed by EFI runtime services, but
>> + * should have no other significance to the OS (UEFI r2.10, sec 7.2).
>> + * However, most bootloaders and EFI stubs convert EfiMemoryMappedIO
>> + * regions to E820_TYPE_RESERVED entries, which prevent Linux from
>> + * allocating space from them (see remove_e820_regions()).
>> + *
>> + * Some platforms use EfiMemoryMappedIO entries for PCI MMCONFIG space and
>> + * PCI host bridge windows, which means Linux can't allocate BAR space for
>> + * hot-added devices.
>> + *
>> + * Remove large EfiMemoryMappedIO regions from the E820 map to avoid this
>> + * problem.
>> + *
>> + * Retain small EfiMemoryMappedIO regions because on some platforms, these
>> + * describe non-window space that's included in host bridge _CRS.  If we
>> + * assign that space to PCI devices, they don't work.
>> + */
>> +static void __init efi_remove_e820_mmio(void)
>> +{
>> +	efi_memory_desc_t *md;
>> +	u64 size, start, end;
>> +	int i = 0;
>> +
>> +	for_each_efi_memory_desc(md) {
>> +		if (md->type == EFI_MEMORY_MAPPED_IO) {
>> +			size = md->num_pages << EFI_PAGE_SHIFT;
>> +			if (size >= 256*1024) {
>> +				start = md->phys_addr;
>> +				end = start + size - 1;
>> +				pr_info("Remove mem%02u: MMIO range=[0x%08llx-0x%08llx] (%lluMB) from e820 map\n",
>> +					i, start, end, size >> 20);
>> +				e820__range_remove(start, size,
>> +						   E820_TYPE_RESERVED, 1);
>> +			} else {
>> +				pr_info("Not removing mem%02u: MMIO range=[0x%08llx-0x%08llx] (%lluKB) from e820 map\n",
>> +					i, start, end, size >> 10);
> 
> The logging in this else is re-using the start and end from the previous section which was actually removed.
> 
> E.g. Matt's latest log from:
> https://bugzilla.redhat.com/show_bug.cgi?id=1868899
> has:
> 
> [    0.000000] e820: remove [mem 0xfc800000-0xfe7fffff] reserved
> [    0.000000] efi: Not removing mem46: MMIO range=[0xfc800000-0xfe7fffff] (4KB) from e820 map
> [    0.000000] efi: Not removing mem47: MMIO range=[0xfc800000-0xfe7fffff] (32KB) from e820 map
> [    0.000000] efi: Not removing mem49: MMIO range=[0xfc800000-0xfe7fffff] (8KB) from e820 map
> [    0.000000] efi: Not removing mem50: MMIO range=[0xfc800000-0xfe7fffff] (4KB) from e820 map
> 
> Notice how all the "Not removing ..." lines log the same range as
> the actually removed map entry above them.

I realize the fix is very obvious, but since I just fixed this in my
local tree anyways, here is my fix for this:

--- a/arch/x86/platform/efi/efi.c
+++ b/arch/x86/platform/efi/efi.c
@@ -331,9 +331,9 @@ static void __init efi_remove_e820_mmio(void)
 	for_each_efi_memory_desc(md) {
 		if (md->type == EFI_MEMORY_MAPPED_IO) {
 			size = md->num_pages << EFI_PAGE_SHIFT;
+			start = md->phys_addr;
+			end = start + size - 1;
 			if (size >= 256*1024) {
-				start = md->phys_addr;
-				end = start + size - 1;
 				pr_info("Remove mem%02u: MMIO range=[0x%08llx-0x%08llx] (%lluMB) from e820 map\n",
 					i, start, end, size >> 20);
 				e820__range_remove(start, size,

Regards,

Hans






>> +			}
>> +		}
>> +		i++;
>> +	}
>> +}
>> +
>>  void __init efi_print_memmap(void)
>>  {
>>  	efi_memory_desc_t *md;
>> @@ -474,6 +518,8 @@ void __init efi_init(void)
>>  	set_bit(EFI_RUNTIME_SERVICES, &efi.flags);
>>  	efi_clean_memmap();
>>  
>> +	efi_remove_e820_mmio();
>> +
>>  	if (efi_enabled(EFI_DBG))
>>  		efi_print_memmap();
>>  }
>
Bjorn Helgaas Dec. 9, 2022, 8:10 p.m. UTC | #3
On Fri, Dec 09, 2022 at 12:04:53PM +0100, Hans de Goede wrote:
> On 12/9/22 09:06, Hans de Goede wrote:
> > One comment (logging bug in patch) below:
> > ...

> > The logging in this else is re-using the start and end from the previous section which was actually removed.
> > 
> > E.g. Matt's latest log from:
> > https://bugzilla.redhat.com/show_bug.cgi?id=1868899
> > has:
> > 
> > [    0.000000] e820: remove [mem 0xfc800000-0xfe7fffff] reserved
> > [    0.000000] efi: Not removing mem46: MMIO range=[0xfc800000-0xfe7fffff] (4KB) from e820 map
> > [    0.000000] efi: Not removing mem47: MMIO range=[0xfc800000-0xfe7fffff] (32KB) from e820 map
> > [    0.000000] efi: Not removing mem49: MMIO range=[0xfc800000-0xfe7fffff] (8KB) from e820 map
> > [    0.000000] efi: Not removing mem50: MMIO range=[0xfc800000-0xfe7fffff] (4KB) from e820 map
> > 
> > Notice how all the "Not removing ..." lines log the same range as
> > the actually removed map entry above them.
> 
> I realize the fix is very obvious, but since I just fixed this in my
> local tree anyways, here is my fix for this:

Thank you!  Incorporated.

> --- a/arch/x86/platform/efi/efi.c
> +++ b/arch/x86/platform/efi/efi.c
> @@ -331,9 +331,9 @@ static void __init efi_remove_e820_mmio(void)
>  	for_each_efi_memory_desc(md) {
>  		if (md->type == EFI_MEMORY_MAPPED_IO) {
>  			size = md->num_pages << EFI_PAGE_SHIFT;
> +			start = md->phys_addr;
> +			end = start + size - 1;
>  			if (size >= 256*1024) {
> -				start = md->phys_addr;
> -				end = start + size - 1;
>  				pr_info("Remove mem%02u: MMIO range=[0x%08llx-0x%08llx] (%lluMB) from e820 map\n",
>  					i, start, end, size >> 20);
>  				e820__range_remove(start, size,
>
Baowen Zheng Jan. 13, 2023, 10:46 a.m. UTC | #4
Hello Bjorn Helgaas:

When we tested our NIC, we found we will fail to access pcie extent configuration space.
Actually we can only access the 256 Bytes of the pcie configuration space.
You can make check with command of "lspci -s XXXX -vv"
It seems relates to this change, could you please help to make a check?


On Thu, 8 Dec 2022 13:03:38 Bjorn Helgaas wrote:
>Firmware can use EfiMemoryMappedIO to request that MMIO regions be mapped
>by the OS so they can be accessed by EFI runtime services, but should have
>no other significance to the OS (UEFI r2.10, sec 7.2).  However, most
>bootloaders and EFI stubs convert EfiMemoryMappedIO regions to
>E820_TYPE_RESERVED entries, which prevent Linux from allocating space from
>them (see remove_e820_regions()).
>
>Some platforms use EfiMemoryMappedIO entries for PCI MMCONFIG space and PCI
>host bridge windows, which means Linux can't allocate BAR space for
>hot-added devices.
>
>Remove large EfiMemoryMappedIO regions from the E820 map to avoid this
>problem.
>

...

>
>Link: https://bugzilla.kernel.org/show_bug.cgi?id=216565     Clevo NL4XLU
>Link: https://bugzilla.kernel.org/show_bug.cgi?id=206459#c78 Clevo X170KM-G
>Link: https://bugzilla.redhat.com/show_bug.cgi?id=1868899    Ideapad Slim 3
>Link: https://bugzilla.redhat.com/show_bug.cgi?id=2029207    X1 Carbon
>Reported-by: Florent DELAHAYE <kernelorg@undead.fr>
>Tested-by: Konrad J Hambrick <kjhambrick@gmail.com>
>Tested-by: Matt Hansen <2lprbe78@duck.com>
>Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
>Cc: Hans de Goede <hdegoede@redhat.com>
>---
> arch/x86/platform/efi/efi.c | 46 +++++++++++++++++++++++++++++++++++++
> 1 file changed, 46 insertions(+)
>
>diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
>index ebc98a68c400..dee1852e95cd 100644
>--- a/arch/x86/platform/efi/efi.c
>+++ b/arch/x86/platform/efi/efi.c
>@@ -303,6 +303,50 @@ static void __init efi_clean_memmap(void)
> 	}
> }
> 
>+/*
>+ * Firmware can use EfiMemoryMappedIO to request that MMIO regions be
>+ * mapped by the OS so they can be accessed by EFI runtime services, but
>+ * should have no other significance to the OS (UEFI r2.10, sec 7.2).
>+ * However, most bootloaders and EFI stubs convert EfiMemoryMappedIO
>+ * regions to E820_TYPE_RESERVED entries, which prevent Linux from
>+ * allocating space from them (see remove_e820_regions()).
>+ *
>+ * Some platforms use EfiMemoryMappedIO entries for PCI MMCONFIG space and
>+ * PCI host bridge windows, which means Linux can't allocate BAR space for
>+ * hot-added devices.
>+ *
>+ * Remove large EfiMemoryMappedIO regions from the E820 map to avoid this
>+ * problem.
>+ *
>+ * Retain small EfiMemoryMappedIO regions because on some platforms, these
>+ * describe non-window space that's included in host bridge _CRS.  If we
>+ * assign that space to PCI devices, they don't work.
>+ */
>+static void __init efi_remove_e820_mmio(void)
>+{
>+	efi_memory_desc_t *md;
>+	u64 size, start, end;
>+	int i = 0;
>+
>+	for_each_efi_memory_desc(md) {
>+		if (md->type == EFI_MEMORY_MAPPED_IO) {
>+			size = md->num_pages << EFI_PAGE_SHIFT;
>+			if (size >= 256*1024) {
>+				start = md->phys_addr;
>+				end = start + size - 1;
>+				pr_info("Remove mem%02u: MMIO range=[0x%08llx-0x%08llx] (%lluMB) from e820 map\n",
>+					i, start, end, size >> 20);
>+				e820__range_remove(start, size,
>+						   E820_TYPE_RESERVED, 1);
>+			} else {
>+				pr_info("Not removing mem%02u: MMIO range=[0x%08llx-0x%08llx] (%lluKB) from e820 map\n",
>+					i, start, end, size >> 10);
>+			}
>+		}
>+		i++;
>+	}
>+}
>+
> void __init efi_print_memmap(void)
> {
> 	efi_memory_desc_t *md;
>@@ -474,6 +518,8 @@ void __init efi_init(void)
> 	set_bit(EFI_RUNTIME_SERVICES, &efi.flags);
> 	efi_clean_memmap();
> 
>+	efi_remove_e820_mmio();
>+
> 	if (efi_enabled(EFI_DBG))
> 		efi_print_memmap();
> }
Bjorn Helgaas Jan. 13, 2023, 1:54 p.m. UTC | #5
On Fri, Jan 13, 2023 at 06:46:03PM +0800, Baowen Zheng wrote:
> Hello Bjorn Helgaas:
> 
> When we tested our NIC, we found we will fail to access pcie extent configuration space.
> Actually we can only access the 256 Bytes of the pcie configuration space.
> You can make check with command of "lspci -s XXXX -vv"
> It seems relates to this change, could you please help to make a check?

Can you test these patches?

  https://lore.kernel.org/linux-pci/20230110180243.1590045-1-helgaas@kernel.org/

Let me know if they don't fix the problem.

Bjorn
diff mbox series

Patch

diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
index ebc98a68c400..dee1852e95cd 100644
--- a/arch/x86/platform/efi/efi.c
+++ b/arch/x86/platform/efi/efi.c
@@ -303,6 +303,50 @@  static void __init efi_clean_memmap(void)
 	}
 }
 
+/*
+ * Firmware can use EfiMemoryMappedIO to request that MMIO regions be
+ * mapped by the OS so they can be accessed by EFI runtime services, but
+ * should have no other significance to the OS (UEFI r2.10, sec 7.2).
+ * However, most bootloaders and EFI stubs convert EfiMemoryMappedIO
+ * regions to E820_TYPE_RESERVED entries, which prevent Linux from
+ * allocating space from them (see remove_e820_regions()).
+ *
+ * Some platforms use EfiMemoryMappedIO entries for PCI MMCONFIG space and
+ * PCI host bridge windows, which means Linux can't allocate BAR space for
+ * hot-added devices.
+ *
+ * Remove large EfiMemoryMappedIO regions from the E820 map to avoid this
+ * problem.
+ *
+ * Retain small EfiMemoryMappedIO regions because on some platforms, these
+ * describe non-window space that's included in host bridge _CRS.  If we
+ * assign that space to PCI devices, they don't work.
+ */
+static void __init efi_remove_e820_mmio(void)
+{
+	efi_memory_desc_t *md;
+	u64 size, start, end;
+	int i = 0;
+
+	for_each_efi_memory_desc(md) {
+		if (md->type == EFI_MEMORY_MAPPED_IO) {
+			size = md->num_pages << EFI_PAGE_SHIFT;
+			if (size >= 256*1024) {
+				start = md->phys_addr;
+				end = start + size - 1;
+				pr_info("Remove mem%02u: MMIO range=[0x%08llx-0x%08llx] (%lluMB) from e820 map\n",
+					i, start, end, size >> 20);
+				e820__range_remove(start, size,
+						   E820_TYPE_RESERVED, 1);
+			} else {
+				pr_info("Not removing mem%02u: MMIO range=[0x%08llx-0x%08llx] (%lluKB) from e820 map\n",
+					i, start, end, size >> 10);
+			}
+		}
+		i++;
+	}
+}
+
 void __init efi_print_memmap(void)
 {
 	efi_memory_desc_t *md;
@@ -474,6 +518,8 @@  void __init efi_init(void)
 	set_bit(EFI_RUNTIME_SERVICES, &efi.flags);
 	efi_clean_memmap();
 
+	efi_remove_e820_mmio();
+
 	if (efi_enabled(EFI_DBG))
 		efi_print_memmap();
 }