diff mbox series

[v3] x86/PCI: Ignore E820 reservations for bridge windows on newer systems

Message ID 20211014110357.17957-1-hdegoede@redhat.com
State New
Headers show
Series [v3] x86/PCI: Ignore E820 reservations for bridge windows on newer systems | expand

Commit Message

Hans de Goede Oct. 14, 2021, 11:03 a.m. UTC
Some BIOS-es contain a bug where they add addresses which map to system RAM
in the PCI bridge memory window returned by the ACPI _CRS method, see
commit 4dc2287c1805 ("x86: avoid E820 regions when allocating address
space").

To avoid this Linux by default excludes E820 reservations when allocating
addresses since 2010. Windows however ignores E820 reserved regions for PCI
mem allocations, so in hindsight Linux honoring them is a problem.

Recently (2020) some systems have shown-up with E820 reservations which
cover the entire _CRS returned PCI bridge memory window, causing all
attempts to assign memory to PCI BARs which have not been setup by the
BIOS to fail. For example here are the relevant dmesg bits from a
Lenovo IdeaPad 3 15IIL 81WE:

 [mem 0x000000004bc50000-0x00000000cfffffff] reserved
 pci_bus 0000:00: root bus resource [mem 0x65400000-0xbfffffff window]

Ideally Linux would fully stop honoring E820 reservations for PCI mem
allocations, but then the old systems this was added for will regress.
Instead keep the old behavior for old systems, while ignoring the E820
reservations like Windows does for any systems from now on.

Old systems are defined here as BIOS year < 2018, this was chosen to
make sure that pci_use_e820 will not be set on the currently affected
systems, while at the same time also taking into account that the
systems for which the E820 checking was originally added may have
received BIOS updates for quite a while (esp. CVE related ones),
giving them a more recent BIOS year then 2010.

Also add pci=no_e820 and pci=use_e820 options to allow overriding
the BIOS year heuristic.

BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=206459
BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1868899
BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1871793
BugLink: https://bugs.launchpad.net/bugs/1878279
BugLink: https://bugs.launchpad.net/bugs/1931715
BugLink: https://bugs.launchpad.net/bugs/1932069
BugLink: https://bugs.launchpad.net/bugs/1921649
Cc: Benoit Grégoire <benoitg@coeus.ca>
Cc: Hui Wang <hui.wang@canonical.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
---
Changes in v3:
- Commit msg tweaks (drop dmesg timestamps, typo fix)
- Use "defined(CONFIG_...)" instead of "defined CONFIG_..."
- Add Mika's Reviewed-by

Changes in v2:
- Replace the per model DMI quirk approach with disabling E820 reservations
  checking for all systems with a BIOS year >= 2018
- Add documentation for the new kernel-parameters to
  Documentation/admin-guide/kernel-parameters.txt
---
Other patches trying to address the same issue:
https://lore.kernel.org/r/20210624095324.34906-1-hui.wang@canonical.com
https://lore.kernel.org/r/20200617164734.84845-1-mika.westerberg@linux.intel.com
V1 patch:
https://lore.kernel.org/r/20211005150956.303707-1-hdegoede@redhat.com
---
 .../admin-guide/kernel-parameters.txt         |  6 ++++
 arch/x86/include/asm/pci_x86.h                | 10 +++++++
 arch/x86/kernel/resource.c                    |  4 +++
 arch/x86/pci/acpi.c                           | 29 +++++++++++++++++++
 arch/x86/pci/common.c                         |  6 ++++
 5 files changed, 55 insertions(+)

Comments

Rafael J. Wysocki Oct. 14, 2021, 11:14 a.m. UTC | #1
On Thu, Oct 14, 2021 at 1:04 PM Hans de Goede <hdegoede@redhat.com> wrote:
>
> Some BIOS-es contain a bug where they add addresses which map to system RAM
> in the PCI bridge memory window returned by the ACPI _CRS method, see
> commit 4dc2287c1805 ("x86: avoid E820 regions when allocating address
> space").
>
> To avoid this Linux by default excludes E820 reservations when allocating
> addresses since 2010. Windows however ignores E820 reserved regions for PCI
> mem allocations, so in hindsight Linux honoring them is a problem.
>
> Recently (2020) some systems have shown-up with E820 reservations which
> cover the entire _CRS returned PCI bridge memory window, causing all
> attempts to assign memory to PCI BARs which have not been setup by the
> BIOS to fail. For example here are the relevant dmesg bits from a
> Lenovo IdeaPad 3 15IIL 81WE:
>
>  [mem 0x000000004bc50000-0x00000000cfffffff] reserved
>  pci_bus 0000:00: root bus resource [mem 0x65400000-0xbfffffff window]
>
> Ideally Linux would fully stop honoring E820 reservations for PCI mem
> allocations, but then the old systems this was added for will regress.
> Instead keep the old behavior for old systems, while ignoring the E820
> reservations like Windows does for any systems from now on.
>
> Old systems are defined here as BIOS year < 2018, this was chosen to
> make sure that pci_use_e820 will not be set on the currently affected
> systems, while at the same time also taking into account that the
> systems for which the E820 checking was originally added may have
> received BIOS updates for quite a while (esp. CVE related ones),
> giving them a more recent BIOS year then 2010.
>
> Also add pci=no_e820 and pci=use_e820 options to allow overriding
> the BIOS year heuristic.
>
> BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=206459
> BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1868899
> BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1871793
> BugLink: https://bugs.launchpad.net/bugs/1878279
> BugLink: https://bugs.launchpad.net/bugs/1931715
> BugLink: https://bugs.launchpad.net/bugs/1932069
> BugLink: https://bugs.launchpad.net/bugs/1921649
> Cc: Benoit Grégoire <benoitg@coeus.ca>
> Cc: Hui Wang <hui.wang@canonical.com>
> Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
> Signed-off-by: Hans de Goede <hdegoede@redhat.com>

Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

with one tiny nit below.

Or please let me know if you want me to pick this up.

> ---
> Changes in v3:
> - Commit msg tweaks (drop dmesg timestamps, typo fix)
> - Use "defined(CONFIG_...)" instead of "defined CONFIG_..."
> - Add Mika's Reviewed-by
>
> Changes in v2:
> - Replace the per model DMI quirk approach with disabling E820 reservations
>   checking for all systems with a BIOS year >= 2018
> - Add documentation for the new kernel-parameters to
>   Documentation/admin-guide/kernel-parameters.txt
> ---
> Other patches trying to address the same issue:
> https://lore.kernel.org/r/20210624095324.34906-1-hui.wang@canonical.com
> https://lore.kernel.org/r/20200617164734.84845-1-mika.westerberg@linux.intel.com
> V1 patch:
> https://lore.kernel.org/r/20211005150956.303707-1-hdegoede@redhat.com
> ---
>  .../admin-guide/kernel-parameters.txt         |  6 ++++
>  arch/x86/include/asm/pci_x86.h                | 10 +++++++
>  arch/x86/kernel/resource.c                    |  4 +++
>  arch/x86/pci/acpi.c                           | 29 +++++++++++++++++++
>  arch/x86/pci/common.c                         |  6 ++++
>  5 files changed, 55 insertions(+)
>
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index 43dc35fe5bc0..969cde5d74c8 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -3949,6 +3949,12 @@
>                                 please report a bug.
>                 nocrs           [X86] Ignore PCI host bridge windows from ACPI.
>                                 If you need to use this, please report a bug.
> +               use_e820        [X86] Honor E820 reservations when allocating
> +                               PCI host bridge memory. If you need to use this,
> +                               please report a bug.
> +               no_e820         [X86] ignore E820 reservations when allocating
> +                               PCI host bridge memory. If you need to use this,
> +                               please report a bug.
>                 routeirq        Do IRQ routing for all PCI devices.
>                                 This is normally done in pci_enable_device(),
>                                 so this option is a temporary workaround
> diff --git a/arch/x86/include/asm/pci_x86.h b/arch/x86/include/asm/pci_x86.h
> index 490411dba438..0bb4e7dd0ffc 100644
> --- a/arch/x86/include/asm/pci_x86.h
> +++ b/arch/x86/include/asm/pci_x86.h
> @@ -39,6 +39,8 @@ do {                                          \
>  #define PCI_ROOT_NO_CRS                0x100000
>  #define PCI_NOASSIGN_BARS      0x200000
>  #define PCI_BIG_ROOT_WINDOW    0x400000
> +#define PCI_USE_E820           0x800000
> +#define PCI_NO_E820            0x1000000
>
>  extern unsigned int pci_probe;
>  extern unsigned long pirq_table_addr;
> @@ -64,6 +66,8 @@ void pcibios_scan_specific_bus(int busn);
>
>  /* pci-irq.c */
>
> +struct pci_dev;
> +
>  struct irq_info {
>         u8 bus, devfn;                  /* Bus, device and function */
>         struct {
> @@ -232,3 +236,9 @@ static inline void mmio_config_writel(void __iomem *pos, u32 val)
>  # define x86_default_pci_init_irq      NULL
>  # define x86_default_pci_fixup_irqs    NULL
>  #endif
> +
> +#if defined(CONFIG_PCI) && defined(CONFIG_ACPI)
> +extern bool pci_use_e820;
> +#else
> +#define pci_use_e820 false
> +#endif
> diff --git a/arch/x86/kernel/resource.c b/arch/x86/kernel/resource.c
> index 9b9fb7882c20..e8dc9bc327bd 100644
> --- a/arch/x86/kernel/resource.c
> +++ b/arch/x86/kernel/resource.c
> @@ -1,6 +1,7 @@
>  // SPDX-License-Identifier: GPL-2.0
>  #include <linux/ioport.h>
>  #include <asm/e820/api.h>
> +#include <asm/pci_x86.h>
>
>  static void resource_clip(struct resource *res, resource_size_t start,
>                           resource_size_t end)
> @@ -28,6 +29,9 @@ static void remove_e820_regions(struct resource *avail)
>         int i;
>         struct e820_entry *entry;
>
> +       if (!pci_use_e820)
> +               return;
> +
>         for (i = 0; i < e820_table->nr_entries; i++) {
>                 entry = &e820_table->entries[i];
>
> diff --git a/arch/x86/pci/acpi.c b/arch/x86/pci/acpi.c
> index 948656069cdd..6c2febe84b6f 100644
> --- a/arch/x86/pci/acpi.c
> +++ b/arch/x86/pci/acpi.c
> @@ -21,6 +21,8 @@ struct pci_root_info {
>
>  static bool pci_use_crs = true;
>  static bool pci_ignore_seg = false;
> +/* Consumed in arch/x86/kernel/resource.c */
> +bool pci_use_e820 = false;
>
>  static int __init set_use_crs(const struct dmi_system_id *id)
>  {
> @@ -160,6 +162,33 @@ void __init pci_acpi_crs_quirks(void)
>                "if necessary, use \"pci=%s\" and report a bug\n",
>                pci_use_crs ? "Using" : "Ignoring",
>                pci_use_crs ? "nocrs" : "use_crs");
> +
> +       /*
> +        * Some BIOS-es contain a bug where they add addresses which map to system
> +        * RAM in the PCI bridge memory window returned by the ACPI _CRS method, see
> +        * commit 4dc2287c1805 ("x86: avoid E820 regions when allocating address space").
> +        * To avoid this Linux by default excludes E820 reservations when allocating
> +        * addresses since 2010. Windows however ignores E820 reserved regions for
> +        * PCI mem allocations, so in hindsight Linux honoring them is a problem.
> +        * In 2020 some systems have shown-up with E820 reservations which cover the
> +        * entire _CRS returned PCI bridge memory window, causing all attempts to
> +        * assign memory to PCI BARs to fail if Linux honors the E820 reservations.
> +        *
> +        * Ideally Linux would fully stop honoring E820 reservations for PCI mem
> +        * allocations, but then the old systems this was added for will regress.
> +        * Instead keep the old behavior for old systems, while ignoring the E820
> +        * reservations like Windows does for any systems from now on.
> +        */
> +       if (year >= 0 && year < 2018)
> +               pci_use_e820 = true;
> +
> +       if (pci_probe & PCI_NO_E820)
> +               pci_use_e820 = false;
> +       else if (pci_probe & PCI_USE_E820)
> +               pci_use_e820 = true;
> +
> +       printk(KERN_INFO "PCI: %s E820 reservations for host bridge windows\n",
> +              pci_use_e820 ? "Honoring" : "Ignoring");

Why not pr_info()?

>  }
>
>  #ifdef CONFIG_PCI_MMCONFIG
> diff --git a/arch/x86/pci/common.c b/arch/x86/pci/common.c
> index 3507f456fcd0..091ec7e94fcb 100644
> --- a/arch/x86/pci/common.c
> +++ b/arch/x86/pci/common.c
> @@ -595,6 +595,12 @@ char *__init pcibios_setup(char *str)
>         } else if (!strcmp(str, "nocrs")) {
>                 pci_probe |= PCI_ROOT_NO_CRS;
>                 return NULL;
> +       } else if (!strcmp(str, "use_e820")) {
> +               pci_probe |= PCI_USE_E820;
> +               return NULL;
> +       } else if (!strcmp(str, "no_e820")) {
> +               pci_probe |= PCI_NO_E820;
> +               return NULL;
>  #ifdef CONFIG_PHYS_ADDR_T_64BIT
>         } else if (!strcmp(str, "big_root_window")) {
>                 pci_probe |= PCI_BIG_ROOT_WINDOW;
> --
Hans de Goede Oct. 14, 2021, 11:24 a.m. UTC | #2
Hi,

On 14-10-2021 13:14, Rafael J. Wysocki wrote:
> On Thu, Oct 14, 2021 at 1:04 PM Hans de Goede <hdegoede@redhat.com> wrote:
>>
>> Some BIOS-es contain a bug where they add addresses which map to system RAM
>> in the PCI bridge memory window returned by the ACPI _CRS method, see
>> commit 4dc2287c1805 ("x86: avoid E820 regions when allocating address
>> space").
>>
>> To avoid this Linux by default excludes E820 reservations when allocating
>> addresses since 2010. Windows however ignores E820 reserved regions for PCI
>> mem allocations, so in hindsight Linux honoring them is a problem.
>>
>> Recently (2020) some systems have shown-up with E820 reservations which
>> cover the entire _CRS returned PCI bridge memory window, causing all
>> attempts to assign memory to PCI BARs which have not been setup by the
>> BIOS to fail. For example here are the relevant dmesg bits from a
>> Lenovo IdeaPad 3 15IIL 81WE:
>>
>>  [mem 0x000000004bc50000-0x00000000cfffffff] reserved
>>  pci_bus 0000:00: root bus resource [mem 0x65400000-0xbfffffff window]
>>
>> Ideally Linux would fully stop honoring E820 reservations for PCI mem
>> allocations, but then the old systems this was added for will regress.
>> Instead keep the old behavior for old systems, while ignoring the E820
>> reservations like Windows does for any systems from now on.
>>
>> Old systems are defined here as BIOS year < 2018, this was chosen to
>> make sure that pci_use_e820 will not be set on the currently affected
>> systems, while at the same time also taking into account that the
>> systems for which the E820 checking was originally added may have
>> received BIOS updates for quite a while (esp. CVE related ones),
>> giving them a more recent BIOS year then 2010.
>>
>> Also add pci=no_e820 and pci=use_e820 options to allow overriding
>> the BIOS year heuristic.
>>
>> BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=206459
>> BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1868899
>> BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1871793
>> BugLink: https://bugs.launchpad.net/bugs/1878279
>> BugLink: https://bugs.launchpad.net/bugs/1931715
>> BugLink: https://bugs.launchpad.net/bugs/1932069
>> BugLink: https://bugs.launchpad.net/bugs/1921649
>> Cc: Benoit Grégoire <benoitg@coeus.ca>
>> Cc: Hui Wang <hui.wang@canonical.com>
>> Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
>> Signed-off-by: Hans de Goede <hdegoede@redhat.com>
> 
> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Thank you.

> with one tiny nit below.
> 
> Or please let me know if you want me to pick this up.

Since all of the changes are under arch/x86/ I expect the x86/tip
folks to pick this up ?

> 
>> ---
>> Changes in v3:
>> - Commit msg tweaks (drop dmesg timestamps, typo fix)
>> - Use "defined(CONFIG_...)" instead of "defined CONFIG_..."
>> - Add Mika's Reviewed-by
>>
>> Changes in v2:
>> - Replace the per model DMI quirk approach with disabling E820 reservations
>>   checking for all systems with a BIOS year >= 2018
>> - Add documentation for the new kernel-parameters to
>>   Documentation/admin-guide/kernel-parameters.txt
>> ---
>> Other patches trying to address the same issue:
>> https://lore.kernel.org/r/20210624095324.34906-1-hui.wang@canonical.com
>> https://lore.kernel.org/r/20200617164734.84845-1-mika.westerberg@linux.intel.com
>> V1 patch:
>> https://lore.kernel.org/r/20211005150956.303707-1-hdegoede@redhat.com
>> ---
>>  .../admin-guide/kernel-parameters.txt         |  6 ++++
>>  arch/x86/include/asm/pci_x86.h                | 10 +++++++
>>  arch/x86/kernel/resource.c                    |  4 +++
>>  arch/x86/pci/acpi.c                           | 29 +++++++++++++++++++
>>  arch/x86/pci/common.c                         |  6 ++++
>>  5 files changed, 55 insertions(+)
>>
>> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
>> index 43dc35fe5bc0..969cde5d74c8 100644
>> --- a/Documentation/admin-guide/kernel-parameters.txt
>> +++ b/Documentation/admin-guide/kernel-parameters.txt
>> @@ -3949,6 +3949,12 @@
>>                                 please report a bug.
>>                 nocrs           [X86] Ignore PCI host bridge windows from ACPI.
>>                                 If you need to use this, please report a bug.
>> +               use_e820        [X86] Honor E820 reservations when allocating
>> +                               PCI host bridge memory. If you need to use this,
>> +                               please report a bug.
>> +               no_e820         [X86] ignore E820 reservations when allocating
>> +                               PCI host bridge memory. If you need to use this,
>> +                               please report a bug.
>>                 routeirq        Do IRQ routing for all PCI devices.
>>                                 This is normally done in pci_enable_device(),
>>                                 so this option is a temporary workaround
>> diff --git a/arch/x86/include/asm/pci_x86.h b/arch/x86/include/asm/pci_x86.h
>> index 490411dba438..0bb4e7dd0ffc 100644
>> --- a/arch/x86/include/asm/pci_x86.h
>> +++ b/arch/x86/include/asm/pci_x86.h
>> @@ -39,6 +39,8 @@ do {                                          \
>>  #define PCI_ROOT_NO_CRS                0x100000
>>  #define PCI_NOASSIGN_BARS      0x200000
>>  #define PCI_BIG_ROOT_WINDOW    0x400000
>> +#define PCI_USE_E820           0x800000
>> +#define PCI_NO_E820            0x1000000
>>
>>  extern unsigned int pci_probe;
>>  extern unsigned long pirq_table_addr;
>> @@ -64,6 +66,8 @@ void pcibios_scan_specific_bus(int busn);
>>
>>  /* pci-irq.c */
>>
>> +struct pci_dev;
>> +
>>  struct irq_info {
>>         u8 bus, devfn;                  /* Bus, device and function */
>>         struct {
>> @@ -232,3 +236,9 @@ static inline void mmio_config_writel(void __iomem *pos, u32 val)
>>  # define x86_default_pci_init_irq      NULL
>>  # define x86_default_pci_fixup_irqs    NULL
>>  #endif
>> +
>> +#if defined(CONFIG_PCI) && defined(CONFIG_ACPI)
>> +extern bool pci_use_e820;
>> +#else
>> +#define pci_use_e820 false
>> +#endif
>> diff --git a/arch/x86/kernel/resource.c b/arch/x86/kernel/resource.c
>> index 9b9fb7882c20..e8dc9bc327bd 100644
>> --- a/arch/x86/kernel/resource.c
>> +++ b/arch/x86/kernel/resource.c
>> @@ -1,6 +1,7 @@
>>  // SPDX-License-Identifier: GPL-2.0
>>  #include <linux/ioport.h>
>>  #include <asm/e820/api.h>
>> +#include <asm/pci_x86.h>
>>
>>  static void resource_clip(struct resource *res, resource_size_t start,
>>                           resource_size_t end)
>> @@ -28,6 +29,9 @@ static void remove_e820_regions(struct resource *avail)
>>         int i;
>>         struct e820_entry *entry;
>>
>> +       if (!pci_use_e820)
>> +               return;
>> +
>>         for (i = 0; i < e820_table->nr_entries; i++) {
>>                 entry = &e820_table->entries[i];
>>
>> diff --git a/arch/x86/pci/acpi.c b/arch/x86/pci/acpi.c
>> index 948656069cdd..6c2febe84b6f 100644
>> --- a/arch/x86/pci/acpi.c
>> +++ b/arch/x86/pci/acpi.c
>> @@ -21,6 +21,8 @@ struct pci_root_info {
>>
>>  static bool pci_use_crs = true;
>>  static bool pci_ignore_seg = false;
>> +/* Consumed in arch/x86/kernel/resource.c */
>> +bool pci_use_e820 = false;
>>
>>  static int __init set_use_crs(const struct dmi_system_id *id)
>>  {
>> @@ -160,6 +162,33 @@ void __init pci_acpi_crs_quirks(void)
>>                "if necessary, use \"pci=%s\" and report a bug\n",
>>                pci_use_crs ? "Using" : "Ignoring",
>>                pci_use_crs ? "nocrs" : "use_crs");
>> +
>> +       /*
>> +        * Some BIOS-es contain a bug where they add addresses which map to system
>> +        * RAM in the PCI bridge memory window returned by the ACPI _CRS method, see
>> +        * commit 4dc2287c1805 ("x86: avoid E820 regions when allocating address space").
>> +        * To avoid this Linux by default excludes E820 reservations when allocating
>> +        * addresses since 2010. Windows however ignores E820 reserved regions for
>> +        * PCI mem allocations, so in hindsight Linux honoring them is a problem.
>> +        * In 2020 some systems have shown-up with E820 reservations which cover the
>> +        * entire _CRS returned PCI bridge memory window, causing all attempts to
>> +        * assign memory to PCI BARs to fail if Linux honors the E820 reservations.
>> +        *
>> +        * Ideally Linux would fully stop honoring E820 reservations for PCI mem
>> +        * allocations, but then the old systems this was added for will regress.
>> +        * Instead keep the old behavior for old systems, while ignoring the E820
>> +        * reservations like Windows does for any systems from now on.
>> +        */
>> +       if (year >= 0 && year < 2018)
>> +               pci_use_e820 = true;
>> +
>> +       if (pci_probe & PCI_NO_E820)
>> +               pci_use_e820 = false;
>> +       else if (pci_probe & PCI_USE_E820)
>> +               pci_use_e820 = true;
>> +
>> +       printk(KERN_INFO "PCI: %s E820 reservations for host bridge windows\n",
>> +              pci_use_e820 ? "Honoring" : "Ignoring");
> 
> Why not pr_info()?

This file is using printk(KERN_... consistently everywhere. I'm just following
the existing style here. I very much dislike mixing styles in a single file. 

If we want to change this for this file then IMHO the right thing to do would
be a follow up patch changing all the printk-s at once.

Regards,

Hans



> 
>>  }
>>
>>  #ifdef CONFIG_PCI_MMCONFIG
>> diff --git a/arch/x86/pci/common.c b/arch/x86/pci/common.c
>> index 3507f456fcd0..091ec7e94fcb 100644
>> --- a/arch/x86/pci/common.c
>> +++ b/arch/x86/pci/common.c
>> @@ -595,6 +595,12 @@ char *__init pcibios_setup(char *str)
>>         } else if (!strcmp(str, "nocrs")) {
>>                 pci_probe |= PCI_ROOT_NO_CRS;
>>                 return NULL;
>> +       } else if (!strcmp(str, "use_e820")) {
>> +               pci_probe |= PCI_USE_E820;
>> +               return NULL;
>> +       } else if (!strcmp(str, "no_e820")) {
>> +               pci_probe |= PCI_NO_E820;
>> +               return NULL;
>>  #ifdef CONFIG_PHYS_ADDR_T_64BIT
>>         } else if (!strcmp(str, "big_root_window")) {
>>                 pci_probe |= PCI_BIG_ROOT_WINDOW;
>> --
>
Rafael J. Wysocki Oct. 14, 2021, 12:06 p.m. UTC | #3
On Thu, Oct 14, 2021 at 1:24 PM Hans de Goede <hdegoede@redhat.com> wrote:
>
> Hi,
>
> On 14-10-2021 13:14, Rafael J. Wysocki wrote:
> > On Thu, Oct 14, 2021 at 1:04 PM Hans de Goede <hdegoede@redhat.com> wrote:
> >>
> >> Some BIOS-es contain a bug where they add addresses which map to system RAM
> >> in the PCI bridge memory window returned by the ACPI _CRS method, see
> >> commit 4dc2287c1805 ("x86: avoid E820 regions when allocating address
> >> space").
> >>
> >> To avoid this Linux by default excludes E820 reservations when allocating
> >> addresses since 2010. Windows however ignores E820 reserved regions for PCI
> >> mem allocations, so in hindsight Linux honoring them is a problem.
> >>
> >> Recently (2020) some systems have shown-up with E820 reservations which
> >> cover the entire _CRS returned PCI bridge memory window, causing all
> >> attempts to assign memory to PCI BARs which have not been setup by the
> >> BIOS to fail. For example here are the relevant dmesg bits from a
> >> Lenovo IdeaPad 3 15IIL 81WE:
> >>
> >>  [mem 0x000000004bc50000-0x00000000cfffffff] reserved
> >>  pci_bus 0000:00: root bus resource [mem 0x65400000-0xbfffffff window]
> >>
> >> Ideally Linux would fully stop honoring E820 reservations for PCI mem
> >> allocations, but then the old systems this was added for will regress.
> >> Instead keep the old behavior for old systems, while ignoring the E820
> >> reservations like Windows does for any systems from now on.
> >>
> >> Old systems are defined here as BIOS year < 2018, this was chosen to
> >> make sure that pci_use_e820 will not be set on the currently affected
> >> systems, while at the same time also taking into account that the
> >> systems for which the E820 checking was originally added may have
> >> received BIOS updates for quite a while (esp. CVE related ones),
> >> giving them a more recent BIOS year then 2010.
> >>
> >> Also add pci=no_e820 and pci=use_e820 options to allow overriding
> >> the BIOS year heuristic.
> >>
> >> BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=206459
> >> BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1868899
> >> BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1871793
> >> BugLink: https://bugs.launchpad.net/bugs/1878279
> >> BugLink: https://bugs.launchpad.net/bugs/1931715
> >> BugLink: https://bugs.launchpad.net/bugs/1932069
> >> BugLink: https://bugs.launchpad.net/bugs/1921649
> >> Cc: Benoit Grégoire <benoitg@coeus.ca>
> >> Cc: Hui Wang <hui.wang@canonical.com>
> >> Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
> >> Signed-off-by: Hans de Goede <hdegoede@redhat.com>
> >
> > Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>
> Thank you.
>
> > with one tiny nit below.
> >
> > Or please let me know if you want me to pick this up.
>
> Since all of the changes are under arch/x86/ I expect the x86/tip
> folks to pick this up ?

OK

> >
> >> ---
> >> Changes in v3:
> >> - Commit msg tweaks (drop dmesg timestamps, typo fix)
> >> - Use "defined(CONFIG_...)" instead of "defined CONFIG_..."
> >> - Add Mika's Reviewed-by
> >>
> >> Changes in v2:
> >> - Replace the per model DMI quirk approach with disabling E820 reservations
> >>   checking for all systems with a BIOS year >= 2018
> >> - Add documentation for the new kernel-parameters to
> >>   Documentation/admin-guide/kernel-parameters.txt
> >> ---
> >> Other patches trying to address the same issue:
> >> https://lore.kernel.org/r/20210624095324.34906-1-hui.wang@canonical.com
> >> https://lore.kernel.org/r/20200617164734.84845-1-mika.westerberg@linux.intel.com
> >> V1 patch:
> >> https://lore.kernel.org/r/20211005150956.303707-1-hdegoede@redhat.com
> >> ---
> >>  .../admin-guide/kernel-parameters.txt         |  6 ++++
> >>  arch/x86/include/asm/pci_x86.h                | 10 +++++++
> >>  arch/x86/kernel/resource.c                    |  4 +++
> >>  arch/x86/pci/acpi.c                           | 29 +++++++++++++++++++
> >>  arch/x86/pci/common.c                         |  6 ++++
> >>  5 files changed, 55 insertions(+)
> >>
> >> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> >> index 43dc35fe5bc0..969cde5d74c8 100644
> >> --- a/Documentation/admin-guide/kernel-parameters.txt
> >> +++ b/Documentation/admin-guide/kernel-parameters.txt
> >> @@ -3949,6 +3949,12 @@
> >>                                 please report a bug.
> >>                 nocrs           [X86] Ignore PCI host bridge windows from ACPI.
> >>                                 If you need to use this, please report a bug.
> >> +               use_e820        [X86] Honor E820 reservations when allocating
> >> +                               PCI host bridge memory. If you need to use this,
> >> +                               please report a bug.
> >> +               no_e820         [X86] ignore E820 reservations when allocating
> >> +                               PCI host bridge memory. If you need to use this,
> >> +                               please report a bug.
> >>                 routeirq        Do IRQ routing for all PCI devices.
> >>                                 This is normally done in pci_enable_device(),
> >>                                 so this option is a temporary workaround
> >> diff --git a/arch/x86/include/asm/pci_x86.h b/arch/x86/include/asm/pci_x86.h
> >> index 490411dba438..0bb4e7dd0ffc 100644
> >> --- a/arch/x86/include/asm/pci_x86.h
> >> +++ b/arch/x86/include/asm/pci_x86.h
> >> @@ -39,6 +39,8 @@ do {                                          \
> >>  #define PCI_ROOT_NO_CRS                0x100000
> >>  #define PCI_NOASSIGN_BARS      0x200000
> >>  #define PCI_BIG_ROOT_WINDOW    0x400000
> >> +#define PCI_USE_E820           0x800000
> >> +#define PCI_NO_E820            0x1000000
> >>
> >>  extern unsigned int pci_probe;
> >>  extern unsigned long pirq_table_addr;
> >> @@ -64,6 +66,8 @@ void pcibios_scan_specific_bus(int busn);
> >>
> >>  /* pci-irq.c */
> >>
> >> +struct pci_dev;
> >> +
> >>  struct irq_info {
> >>         u8 bus, devfn;                  /* Bus, device and function */
> >>         struct {
> >> @@ -232,3 +236,9 @@ static inline void mmio_config_writel(void __iomem *pos, u32 val)
> >>  # define x86_default_pci_init_irq      NULL
> >>  # define x86_default_pci_fixup_irqs    NULL
> >>  #endif
> >> +
> >> +#if defined(CONFIG_PCI) && defined(CONFIG_ACPI)
> >> +extern bool pci_use_e820;
> >> +#else
> >> +#define pci_use_e820 false
> >> +#endif
> >> diff --git a/arch/x86/kernel/resource.c b/arch/x86/kernel/resource.c
> >> index 9b9fb7882c20..e8dc9bc327bd 100644
> >> --- a/arch/x86/kernel/resource.c
> >> +++ b/arch/x86/kernel/resource.c
> >> @@ -1,6 +1,7 @@
> >>  // SPDX-License-Identifier: GPL-2.0
> >>  #include <linux/ioport.h>
> >>  #include <asm/e820/api.h>
> >> +#include <asm/pci_x86.h>
> >>
> >>  static void resource_clip(struct resource *res, resource_size_t start,
> >>                           resource_size_t end)
> >> @@ -28,6 +29,9 @@ static void remove_e820_regions(struct resource *avail)
> >>         int i;
> >>         struct e820_entry *entry;
> >>
> >> +       if (!pci_use_e820)
> >> +               return;
> >> +
> >>         for (i = 0; i < e820_table->nr_entries; i++) {
> >>                 entry = &e820_table->entries[i];
> >>
> >> diff --git a/arch/x86/pci/acpi.c b/arch/x86/pci/acpi.c
> >> index 948656069cdd..6c2febe84b6f 100644
> >> --- a/arch/x86/pci/acpi.c
> >> +++ b/arch/x86/pci/acpi.c
> >> @@ -21,6 +21,8 @@ struct pci_root_info {
> >>
> >>  static bool pci_use_crs = true;
> >>  static bool pci_ignore_seg = false;
> >> +/* Consumed in arch/x86/kernel/resource.c */
> >> +bool pci_use_e820 = false;
> >>
> >>  static int __init set_use_crs(const struct dmi_system_id *id)
> >>  {
> >> @@ -160,6 +162,33 @@ void __init pci_acpi_crs_quirks(void)
> >>                "if necessary, use \"pci=%s\" and report a bug\n",
> >>                pci_use_crs ? "Using" : "Ignoring",
> >>                pci_use_crs ? "nocrs" : "use_crs");
> >> +
> >> +       /*
> >> +        * Some BIOS-es contain a bug where they add addresses which map to system
> >> +        * RAM in the PCI bridge memory window returned by the ACPI _CRS method, see
> >> +        * commit 4dc2287c1805 ("x86: avoid E820 regions when allocating address space").
> >> +        * To avoid this Linux by default excludes E820 reservations when allocating
> >> +        * addresses since 2010. Windows however ignores E820 reserved regions for
> >> +        * PCI mem allocations, so in hindsight Linux honoring them is a problem.
> >> +        * In 2020 some systems have shown-up with E820 reservations which cover the
> >> +        * entire _CRS returned PCI bridge memory window, causing all attempts to
> >> +        * assign memory to PCI BARs to fail if Linux honors the E820 reservations.
> >> +        *
> >> +        * Ideally Linux would fully stop honoring E820 reservations for PCI mem
> >> +        * allocations, but then the old systems this was added for will regress.
> >> +        * Instead keep the old behavior for old systems, while ignoring the E820
> >> +        * reservations like Windows does for any systems from now on.
> >> +        */
> >> +       if (year >= 0 && year < 2018)
> >> +               pci_use_e820 = true;
> >> +
> >> +       if (pci_probe & PCI_NO_E820)
> >> +               pci_use_e820 = false;
> >> +       else if (pci_probe & PCI_USE_E820)
> >> +               pci_use_e820 = true;
> >> +
> >> +       printk(KERN_INFO "PCI: %s E820 reservations for host bridge windows\n",
> >> +              pci_use_e820 ? "Honoring" : "Ignoring");
> >
> > Why not pr_info()?
>
> This file is using printk(KERN_... consistently everywhere. I'm just following
> the existing style here. I very much dislike mixing styles in a single file.

In this particular case, it isn't just a matter of style.

Also, if what is regarded as a good practice has changed since the
file was created, should new code added to it be prevented from
following the new good practice, because the old code didn't follow
it?

> If we want to change this for this file then IMHO the right thing to do would
> be a follow up patch changing all the printk-s at once.

I would do the pr_info() here in this patch and change the rest of the
file to follow in a subsequent patch.
Christoph Hellwig Oct. 14, 2021, 1:40 p.m. UTC | #4
On Thu, Oct 14, 2021 at 01:03:57PM +0200, Hans de Goede wrote:
> +	 * Some BIOS-es contain a bug where they add addresses which map to system
> +	 * RAM in the PCI bridge memory window returned by the ACPI _CRS method,

Please avoid the overly long lines in your comments.
Hans de Goede Oct. 14, 2021, 2:21 p.m. UTC | #5
Hi,

On 10/14/21 2:06 PM, Rafael J. Wysocki wrote:
> On Thu, Oct 14, 2021 at 1:24 PM Hans de Goede <hdegoede@redhat.com> wrote:
>>
>> Hi,
>>
>> On 14-10-2021 13:14, Rafael J. Wysocki wrote:
>>> On Thu, Oct 14, 2021 at 1:04 PM Hans de Goede <hdegoede@redhat.com> wrote:
>>>>
>>>> Some BIOS-es contain a bug where they add addresses which map to system RAM
>>>> in the PCI bridge memory window returned by the ACPI _CRS method, see
>>>> commit 4dc2287c1805 ("x86: avoid E820 regions when allocating address
>>>> space").
>>>>
>>>> To avoid this Linux by default excludes E820 reservations when allocating
>>>> addresses since 2010. Windows however ignores E820 reserved regions for PCI
>>>> mem allocations, so in hindsight Linux honoring them is a problem.
>>>>
>>>> Recently (2020) some systems have shown-up with E820 reservations which
>>>> cover the entire _CRS returned PCI bridge memory window, causing all
>>>> attempts to assign memory to PCI BARs which have not been setup by the
>>>> BIOS to fail. For example here are the relevant dmesg bits from a
>>>> Lenovo IdeaPad 3 15IIL 81WE:
>>>>
>>>>  [mem 0x000000004bc50000-0x00000000cfffffff] reserved
>>>>  pci_bus 0000:00: root bus resource [mem 0x65400000-0xbfffffff window]
>>>>
>>>> Ideally Linux would fully stop honoring E820 reservations for PCI mem
>>>> allocations, but then the old systems this was added for will regress.
>>>> Instead keep the old behavior for old systems, while ignoring the E820
>>>> reservations like Windows does for any systems from now on.
>>>>
>>>> Old systems are defined here as BIOS year < 2018, this was chosen to
>>>> make sure that pci_use_e820 will not be set on the currently affected
>>>> systems, while at the same time also taking into account that the
>>>> systems for which the E820 checking was originally added may have
>>>> received BIOS updates for quite a while (esp. CVE related ones),
>>>> giving them a more recent BIOS year then 2010.
>>>>
>>>> Also add pci=no_e820 and pci=use_e820 options to allow overriding
>>>> the BIOS year heuristic.
>>>>
>>>> BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=206459
>>>> BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1868899
>>>> BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1871793
>>>> BugLink: https://bugs.launchpad.net/bugs/1878279
>>>> BugLink: https://bugs.launchpad.net/bugs/1931715
>>>> BugLink: https://bugs.launchpad.net/bugs/1932069
>>>> BugLink: https://bugs.launchpad.net/bugs/1921649
>>>> Cc: Benoit Grégoire <benoitg@coeus.ca>
>>>> Cc: Hui Wang <hui.wang@canonical.com>
>>>> Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
>>>> Signed-off-by: Hans de Goede <hdegoede@redhat.com>
>>>
>>> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>>
>> Thank you.
>>
>>> with one tiny nit below.
>>>
>>> Or please let me know if you want me to pick this up.
>>
>> Since all of the changes are under arch/x86/ I expect the x86/tip
>> folks to pick this up ?
> 
> OK
> 
>>>
>>>> ---
>>>> Changes in v3:
>>>> - Commit msg tweaks (drop dmesg timestamps, typo fix)
>>>> - Use "defined(CONFIG_...)" instead of "defined CONFIG_..."
>>>> - Add Mika's Reviewed-by
>>>>
>>>> Changes in v2:
>>>> - Replace the per model DMI quirk approach with disabling E820 reservations
>>>>   checking for all systems with a BIOS year >= 2018
>>>> - Add documentation for the new kernel-parameters to
>>>>   Documentation/admin-guide/kernel-parameters.txt
>>>> ---
>>>> Other patches trying to address the same issue:
>>>> https://lore.kernel.org/r/20210624095324.34906-1-hui.wang@canonical.com
>>>> https://lore.kernel.org/r/20200617164734.84845-1-mika.westerberg@linux.intel.com
>>>> V1 patch:
>>>> https://lore.kernel.org/r/20211005150956.303707-1-hdegoede@redhat.com
>>>> ---
>>>>  .../admin-guide/kernel-parameters.txt         |  6 ++++
>>>>  arch/x86/include/asm/pci_x86.h                | 10 +++++++
>>>>  arch/x86/kernel/resource.c                    |  4 +++
>>>>  arch/x86/pci/acpi.c                           | 29 +++++++++++++++++++
>>>>  arch/x86/pci/common.c                         |  6 ++++
>>>>  5 files changed, 55 insertions(+)
>>>>
>>>> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
>>>> index 43dc35fe5bc0..969cde5d74c8 100644
>>>> --- a/Documentation/admin-guide/kernel-parameters.txt
>>>> +++ b/Documentation/admin-guide/kernel-parameters.txt
>>>> @@ -3949,6 +3949,12 @@
>>>>                                 please report a bug.
>>>>                 nocrs           [X86] Ignore PCI host bridge windows from ACPI.
>>>>                                 If you need to use this, please report a bug.
>>>> +               use_e820        [X86] Honor E820 reservations when allocating
>>>> +                               PCI host bridge memory. If you need to use this,
>>>> +                               please report a bug.
>>>> +               no_e820         [X86] ignore E820 reservations when allocating
>>>> +                               PCI host bridge memory. If you need to use this,
>>>> +                               please report a bug.
>>>>                 routeirq        Do IRQ routing for all PCI devices.
>>>>                                 This is normally done in pci_enable_device(),
>>>>                                 so this option is a temporary workaround
>>>> diff --git a/arch/x86/include/asm/pci_x86.h b/arch/x86/include/asm/pci_x86.h
>>>> index 490411dba438..0bb4e7dd0ffc 100644
>>>> --- a/arch/x86/include/asm/pci_x86.h
>>>> +++ b/arch/x86/include/asm/pci_x86.h
>>>> @@ -39,6 +39,8 @@ do {                                          \
>>>>  #define PCI_ROOT_NO_CRS                0x100000
>>>>  #define PCI_NOASSIGN_BARS      0x200000
>>>>  #define PCI_BIG_ROOT_WINDOW    0x400000
>>>> +#define PCI_USE_E820           0x800000
>>>> +#define PCI_NO_E820            0x1000000
>>>>
>>>>  extern unsigned int pci_probe;
>>>>  extern unsigned long pirq_table_addr;
>>>> @@ -64,6 +66,8 @@ void pcibios_scan_specific_bus(int busn);
>>>>
>>>>  /* pci-irq.c */
>>>>
>>>> +struct pci_dev;
>>>> +
>>>>  struct irq_info {
>>>>         u8 bus, devfn;                  /* Bus, device and function */
>>>>         struct {
>>>> @@ -232,3 +236,9 @@ static inline void mmio_config_writel(void __iomem *pos, u32 val)
>>>>  # define x86_default_pci_init_irq      NULL
>>>>  # define x86_default_pci_fixup_irqs    NULL
>>>>  #endif
>>>> +
>>>> +#if defined(CONFIG_PCI) && defined(CONFIG_ACPI)
>>>> +extern bool pci_use_e820;
>>>> +#else
>>>> +#define pci_use_e820 false
>>>> +#endif
>>>> diff --git a/arch/x86/kernel/resource.c b/arch/x86/kernel/resource.c
>>>> index 9b9fb7882c20..e8dc9bc327bd 100644
>>>> --- a/arch/x86/kernel/resource.c
>>>> +++ b/arch/x86/kernel/resource.c
>>>> @@ -1,6 +1,7 @@
>>>>  // SPDX-License-Identifier: GPL-2.0
>>>>  #include <linux/ioport.h>
>>>>  #include <asm/e820/api.h>
>>>> +#include <asm/pci_x86.h>
>>>>
>>>>  static void resource_clip(struct resource *res, resource_size_t start,
>>>>                           resource_size_t end)
>>>> @@ -28,6 +29,9 @@ static void remove_e820_regions(struct resource *avail)
>>>>         int i;
>>>>         struct e820_entry *entry;
>>>>
>>>> +       if (!pci_use_e820)
>>>> +               return;
>>>> +
>>>>         for (i = 0; i < e820_table->nr_entries; i++) {
>>>>                 entry = &e820_table->entries[i];
>>>>
>>>> diff --git a/arch/x86/pci/acpi.c b/arch/x86/pci/acpi.c
>>>> index 948656069cdd..6c2febe84b6f 100644
>>>> --- a/arch/x86/pci/acpi.c
>>>> +++ b/arch/x86/pci/acpi.c
>>>> @@ -21,6 +21,8 @@ struct pci_root_info {
>>>>
>>>>  static bool pci_use_crs = true;
>>>>  static bool pci_ignore_seg = false;
>>>> +/* Consumed in arch/x86/kernel/resource.c */
>>>> +bool pci_use_e820 = false;
>>>>
>>>>  static int __init set_use_crs(const struct dmi_system_id *id)
>>>>  {
>>>> @@ -160,6 +162,33 @@ void __init pci_acpi_crs_quirks(void)
>>>>                "if necessary, use \"pci=%s\" and report a bug\n",
>>>>                pci_use_crs ? "Using" : "Ignoring",
>>>>                pci_use_crs ? "nocrs" : "use_crs");
>>>> +
>>>> +       /*
>>>> +        * Some BIOS-es contain a bug where they add addresses which map to system
>>>> +        * RAM in the PCI bridge memory window returned by the ACPI _CRS method, see
>>>> +        * commit 4dc2287c1805 ("x86: avoid E820 regions when allocating address space").
>>>> +        * To avoid this Linux by default excludes E820 reservations when allocating
>>>> +        * addresses since 2010. Windows however ignores E820 reserved regions for
>>>> +        * PCI mem allocations, so in hindsight Linux honoring them is a problem.
>>>> +        * In 2020 some systems have shown-up with E820 reservations which cover the
>>>> +        * entire _CRS returned PCI bridge memory window, causing all attempts to
>>>> +        * assign memory to PCI BARs to fail if Linux honors the E820 reservations.
>>>> +        *
>>>> +        * Ideally Linux would fully stop honoring E820 reservations for PCI mem
>>>> +        * allocations, but then the old systems this was added for will regress.
>>>> +        * Instead keep the old behavior for old systems, while ignoring the E820
>>>> +        * reservations like Windows does for any systems from now on.
>>>> +        */
>>>> +       if (year >= 0 && year < 2018)
>>>> +               pci_use_e820 = true;
>>>> +
>>>> +       if (pci_probe & PCI_NO_E820)
>>>> +               pci_use_e820 = false;
>>>> +       else if (pci_probe & PCI_USE_E820)
>>>> +               pci_use_e820 = true;
>>>> +
>>>> +       printk(KERN_INFO "PCI: %s E820 reservations for host bridge windows\n",
>>>> +              pci_use_e820 ? "Honoring" : "Ignoring");
>>>
>>> Why not pr_info()?
>>
>> This file is using printk(KERN_... consistently everywhere. I'm just following
>> the existing style here. I very much dislike mixing styles in a single file.
> 
> In this particular case, it isn't just a matter of style.

Without a #define pr_fmt in the file there is no functional difference.

> Also, if what is regarded as a good practice has changed since the
> file was created, should new code added to it be prevented from
> following the new good practice, because the old code didn't follow
> it?

That is a non trivial question to answer, e.g. using devm_ functions
while the rest of the driver is not using them can be tricky and it
might be better to convert the whole driver over to devm_ use in one go.

>> If we want to change this for this file then IMHO the right thing to do would
>> be a follow up patch changing all the printk-s at once.
> 
> I would do the pr_info() here in this patch and change the rest of the
> file to follow in a subsequent patch.

All printk's in this file are prefixed with "PCI: " so converting to
pr_info() should probably involve adding this:

#define pr_fmt(fmt) "PCI: " fmt

So should I add that already while using pr_info() in this patch,
which would look weird / look like an unrelated change?

Or should I not add that and manually add the "PCI: " prefix,
requiring the pr_info to still be replaced in a subsequent patch
converting the rest over to pr_info() ?

IMHO it makes the most sense to keep printk here and then
replace the printk with a pr_info, dropping the "PCI: "
prefix in a subsequent patch converting all the printk-s.

That would also make the subsequent patch cleaner, because
replacing a pr_info with a pr_info in that patch would
look weird.

Regards,

Hans
Rafael J. Wysocki Oct. 14, 2021, 2:53 p.m. UTC | #6
On Thu, Oct 14, 2021 at 4:21 PM Hans de Goede <hdegoede@redhat.com> wrote:
>
> Hi,
>
> On 10/14/21 2:06 PM, Rafael J. Wysocki wrote:
> > On Thu, Oct 14, 2021 at 1:24 PM Hans de Goede <hdegoede@redhat.com> wrote:
> >>
> >> Hi,
> >>
> >> On 14-10-2021 13:14, Rafael J. Wysocki wrote:
> >>> On Thu, Oct 14, 2021 at 1:04 PM Hans de Goede <hdegoede@redhat.com> wrote:
> >>>>
> >>>> Some BIOS-es contain a bug where they add addresses which map to system RAM
> >>>> in the PCI bridge memory window returned by the ACPI _CRS method, see
> >>>> commit 4dc2287c1805 ("x86: avoid E820 regions when allocating address
> >>>> space").
> >>>>
> >>>> To avoid this Linux by default excludes E820 reservations when allocating
> >>>> addresses since 2010. Windows however ignores E820 reserved regions for PCI
> >>>> mem allocations, so in hindsight Linux honoring them is a problem.
> >>>>
> >>>> Recently (2020) some systems have shown-up with E820 reservations which
> >>>> cover the entire _CRS returned PCI bridge memory window, causing all
> >>>> attempts to assign memory to PCI BARs which have not been setup by the
> >>>> BIOS to fail. For example here are the relevant dmesg bits from a
> >>>> Lenovo IdeaPad 3 15IIL 81WE:
> >>>>
> >>>>  [mem 0x000000004bc50000-0x00000000cfffffff] reserved
> >>>>  pci_bus 0000:00: root bus resource [mem 0x65400000-0xbfffffff window]
> >>>>
> >>>> Ideally Linux would fully stop honoring E820 reservations for PCI mem
> >>>> allocations, but then the old systems this was added for will regress.
> >>>> Instead keep the old behavior for old systems, while ignoring the E820
> >>>> reservations like Windows does for any systems from now on.
> >>>>
> >>>> Old systems are defined here as BIOS year < 2018, this was chosen to
> >>>> make sure that pci_use_e820 will not be set on the currently affected
> >>>> systems, while at the same time also taking into account that the
> >>>> systems for which the E820 checking was originally added may have
> >>>> received BIOS updates for quite a while (esp. CVE related ones),
> >>>> giving them a more recent BIOS year then 2010.
> >>>>
> >>>> Also add pci=no_e820 and pci=use_e820 options to allow overriding
> >>>> the BIOS year heuristic.
> >>>>
> >>>> BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=206459
> >>>> BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1868899
> >>>> BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1871793
> >>>> BugLink: https://bugs.launchpad.net/bugs/1878279
> >>>> BugLink: https://bugs.launchpad.net/bugs/1931715
> >>>> BugLink: https://bugs.launchpad.net/bugs/1932069
> >>>> BugLink: https://bugs.launchpad.net/bugs/1921649
> >>>> Cc: Benoit Grégoire <benoitg@coeus.ca>
> >>>> Cc: Hui Wang <hui.wang@canonical.com>
> >>>> Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
> >>>> Signed-off-by: Hans de Goede <hdegoede@redhat.com>
> >>>
> >>> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> >>
> >> Thank you.
> >>
> >>> with one tiny nit below.
> >>>
> >>> Or please let me know if you want me to pick this up.
> >>
> >> Since all of the changes are under arch/x86/ I expect the x86/tip
> >> folks to pick this up ?
> >
> > OK
> >
> >>>
> >>>> ---
> >>>> Changes in v3:
> >>>> - Commit msg tweaks (drop dmesg timestamps, typo fix)
> >>>> - Use "defined(CONFIG_...)" instead of "defined CONFIG_..."
> >>>> - Add Mika's Reviewed-by
> >>>>
> >>>> Changes in v2:
> >>>> - Replace the per model DMI quirk approach with disabling E820 reservations
> >>>>   checking for all systems with a BIOS year >= 2018
> >>>> - Add documentation for the new kernel-parameters to
> >>>>   Documentation/admin-guide/kernel-parameters.txt
> >>>> ---
> >>>> Other patches trying to address the same issue:
> >>>> https://lore.kernel.org/r/20210624095324.34906-1-hui.wang@canonical.com
> >>>> https://lore.kernel.org/r/20200617164734.84845-1-mika.westerberg@linux.intel.com
> >>>> V1 patch:
> >>>> https://lore.kernel.org/r/20211005150956.303707-1-hdegoede@redhat.com
> >>>> ---
> >>>>  .../admin-guide/kernel-parameters.txt         |  6 ++++
> >>>>  arch/x86/include/asm/pci_x86.h                | 10 +++++++
> >>>>  arch/x86/kernel/resource.c                    |  4 +++
> >>>>  arch/x86/pci/acpi.c                           | 29 +++++++++++++++++++
> >>>>  arch/x86/pci/common.c                         |  6 ++++
> >>>>  5 files changed, 55 insertions(+)
> >>>>
> >>>> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> >>>> index 43dc35fe5bc0..969cde5d74c8 100644
> >>>> --- a/Documentation/admin-guide/kernel-parameters.txt
> >>>> +++ b/Documentation/admin-guide/kernel-parameters.txt
> >>>> @@ -3949,6 +3949,12 @@
> >>>>                                 please report a bug.
> >>>>                 nocrs           [X86] Ignore PCI host bridge windows from ACPI.
> >>>>                                 If you need to use this, please report a bug.
> >>>> +               use_e820        [X86] Honor E820 reservations when allocating
> >>>> +                               PCI host bridge memory. If you need to use this,
> >>>> +                               please report a bug.
> >>>> +               no_e820         [X86] ignore E820 reservations when allocating
> >>>> +                               PCI host bridge memory. If you need to use this,
> >>>> +                               please report a bug.
> >>>>                 routeirq        Do IRQ routing for all PCI devices.
> >>>>                                 This is normally done in pci_enable_device(),
> >>>>                                 so this option is a temporary workaround
> >>>> diff --git a/arch/x86/include/asm/pci_x86.h b/arch/x86/include/asm/pci_x86.h
> >>>> index 490411dba438..0bb4e7dd0ffc 100644
> >>>> --- a/arch/x86/include/asm/pci_x86.h
> >>>> +++ b/arch/x86/include/asm/pci_x86.h
> >>>> @@ -39,6 +39,8 @@ do {                                          \
> >>>>  #define PCI_ROOT_NO_CRS                0x100000
> >>>>  #define PCI_NOASSIGN_BARS      0x200000
> >>>>  #define PCI_BIG_ROOT_WINDOW    0x400000
> >>>> +#define PCI_USE_E820           0x800000
> >>>> +#define PCI_NO_E820            0x1000000
> >>>>
> >>>>  extern unsigned int pci_probe;
> >>>>  extern unsigned long pirq_table_addr;
> >>>> @@ -64,6 +66,8 @@ void pcibios_scan_specific_bus(int busn);
> >>>>
> >>>>  /* pci-irq.c */
> >>>>
> >>>> +struct pci_dev;
> >>>> +
> >>>>  struct irq_info {
> >>>>         u8 bus, devfn;                  /* Bus, device and function */
> >>>>         struct {
> >>>> @@ -232,3 +236,9 @@ static inline void mmio_config_writel(void __iomem *pos, u32 val)
> >>>>  # define x86_default_pci_init_irq      NULL
> >>>>  # define x86_default_pci_fixup_irqs    NULL
> >>>>  #endif
> >>>> +
> >>>> +#if defined(CONFIG_PCI) && defined(CONFIG_ACPI)
> >>>> +extern bool pci_use_e820;
> >>>> +#else
> >>>> +#define pci_use_e820 false
> >>>> +#endif
> >>>> diff --git a/arch/x86/kernel/resource.c b/arch/x86/kernel/resource.c
> >>>> index 9b9fb7882c20..e8dc9bc327bd 100644
> >>>> --- a/arch/x86/kernel/resource.c
> >>>> +++ b/arch/x86/kernel/resource.c
> >>>> @@ -1,6 +1,7 @@
> >>>>  // SPDX-License-Identifier: GPL-2.0
> >>>>  #include <linux/ioport.h>
> >>>>  #include <asm/e820/api.h>
> >>>> +#include <asm/pci_x86.h>
> >>>>
> >>>>  static void resource_clip(struct resource *res, resource_size_t start,
> >>>>                           resource_size_t end)
> >>>> @@ -28,6 +29,9 @@ static void remove_e820_regions(struct resource *avail)
> >>>>         int i;
> >>>>         struct e820_entry *entry;
> >>>>
> >>>> +       if (!pci_use_e820)
> >>>> +               return;
> >>>> +
> >>>>         for (i = 0; i < e820_table->nr_entries; i++) {
> >>>>                 entry = &e820_table->entries[i];
> >>>>
> >>>> diff --git a/arch/x86/pci/acpi.c b/arch/x86/pci/acpi.c
> >>>> index 948656069cdd..6c2febe84b6f 100644
> >>>> --- a/arch/x86/pci/acpi.c
> >>>> +++ b/arch/x86/pci/acpi.c
> >>>> @@ -21,6 +21,8 @@ struct pci_root_info {
> >>>>
> >>>>  static bool pci_use_crs = true;
> >>>>  static bool pci_ignore_seg = false;
> >>>> +/* Consumed in arch/x86/kernel/resource.c */
> >>>> +bool pci_use_e820 = false;
> >>>>
> >>>>  static int __init set_use_crs(const struct dmi_system_id *id)
> >>>>  {
> >>>> @@ -160,6 +162,33 @@ void __init pci_acpi_crs_quirks(void)
> >>>>                "if necessary, use \"pci=%s\" and report a bug\n",
> >>>>                pci_use_crs ? "Using" : "Ignoring",
> >>>>                pci_use_crs ? "nocrs" : "use_crs");
> >>>> +
> >>>> +       /*
> >>>> +        * Some BIOS-es contain a bug where they add addresses which map to system
> >>>> +        * RAM in the PCI bridge memory window returned by the ACPI _CRS method, see
> >>>> +        * commit 4dc2287c1805 ("x86: avoid E820 regions when allocating address space").
> >>>> +        * To avoid this Linux by default excludes E820 reservations when allocating
> >>>> +        * addresses since 2010. Windows however ignores E820 reserved regions for
> >>>> +        * PCI mem allocations, so in hindsight Linux honoring them is a problem.
> >>>> +        * In 2020 some systems have shown-up with E820 reservations which cover the
> >>>> +        * entire _CRS returned PCI bridge memory window, causing all attempts to
> >>>> +        * assign memory to PCI BARs to fail if Linux honors the E820 reservations.
> >>>> +        *
> >>>> +        * Ideally Linux would fully stop honoring E820 reservations for PCI mem
> >>>> +        * allocations, but then the old systems this was added for will regress.
> >>>> +        * Instead keep the old behavior for old systems, while ignoring the E820
> >>>> +        * reservations like Windows does for any systems from now on.
> >>>> +        */
> >>>> +       if (year >= 0 && year < 2018)
> >>>> +               pci_use_e820 = true;
> >>>> +
> >>>> +       if (pci_probe & PCI_NO_E820)
> >>>> +               pci_use_e820 = false;
> >>>> +       else if (pci_probe & PCI_USE_E820)
> >>>> +               pci_use_e820 = true;
> >>>> +
> >>>> +       printk(KERN_INFO "PCI: %s E820 reservations for host bridge windows\n",
> >>>> +              pci_use_e820 ? "Honoring" : "Ignoring");
> >>>
> >>> Why not pr_info()?
> >>
> >> This file is using printk(KERN_... consistently everywhere. I'm just following
> >> the existing style here. I very much dislike mixing styles in a single file.
> >
> > In this particular case, it isn't just a matter of style.
>
> Without a #define pr_fmt in the file there is no functional difference.
>
> > Also, if what is regarded as a good practice has changed since the
> > file was created, should new code added to it be prevented from
> > following the new good practice, because the old code didn't follow
> > it?
>
> That is a non trivial question to answer, e.g. using devm_ functions
> while the rest of the driver is not using them can be tricky and it
> might be better to convert the whole driver over to devm_ use in one go.

Right.

> >> If we want to change this for this file then IMHO the right thing to do would
> >> be a follow up patch changing all the printk-s at once.
> >
> > I would do the pr_info() here in this patch and change the rest of the
> > file to follow in a subsequent patch.
>
> All printk's in this file are prefixed with "PCI: " so converting to
> pr_info() should probably involve adding this:
>
> #define pr_fmt(fmt) "PCI: " fmt
>
> So should I add that already while using pr_info() in this patch,
> which would look weird / look like an unrelated change?
>
> Or should I not add that and manually add the "PCI: " prefix,
> requiring the pr_info to still be replaced in a subsequent patch
> converting the rest over to pr_info() ?
>
> IMHO it makes the most sense to keep printk here and then
> replace the printk with a pr_info, dropping the "PCI: "
> prefix in a subsequent patch converting all the printk-s.
>
> That would also make the subsequent patch cleaner, because
> replacing a pr_info with a pr_info in that patch would
> look weird.

OK
Hans de Goede Oct. 14, 2021, 3:15 p.m. UTC | #7
Hi,

On 10/14/21 3:40 PM, Christoph Hellwig wrote:
> On Thu, Oct 14, 2021 at 01:03:57PM +0200, Hans de Goede wrote:
>> +	 * Some BIOS-es contain a bug where they add addresses which map to system
>> +	 * RAM in the PCI bridge memory window returned by the ACPI _CRS method,
> 
> Please avoid the overly long lines in your comments.

?

These lines are easily within the new 100 char line-length-limit ?

And checkpatch also does not complain about this.

Regards,

Hans
Christoph Hellwig Oct. 14, 2021, 3:16 p.m. UTC | #8
On Thu, Oct 14, 2021 at 05:15:02PM +0200, Hans de Goede wrote:
> Hi,
> 
> On 10/14/21 3:40 PM, Christoph Hellwig wrote:
> > On Thu, Oct 14, 2021 at 01:03:57PM +0200, Hans de Goede wrote:
> >> +	 * Some BIOS-es contain a bug where they add addresses which map to system
> >> +	 * RAM in the PCI bridge memory window returned by the ACPI _CRS method,
> > 
> > Please avoid the overly long lines in your comments.
> 
> ?
> 
> These lines are easily within the new 100 char line-length-limit ?

That is an exceptional limit if exceeding 80 improves readability for
individual lines.

> And checkpatch also does not complain about this.

checkpatch has been completely broken for a while and should not be
relied on in any way.
Hans de Goede Oct. 14, 2021, 3:24 p.m. UTC | #9
Hi,

On 10/14/21 5:16 PM, Christoph Hellwig wrote:
> On Thu, Oct 14, 2021 at 05:15:02PM +0200, Hans de Goede wrote:
>> Hi,
>>
>> On 10/14/21 3:40 PM, Christoph Hellwig wrote:
>>> On Thu, Oct 14, 2021 at 01:03:57PM +0200, Hans de Goede wrote:
>>>> +	 * Some BIOS-es contain a bug where they add addresses which map to system
>>>> +	 * RAM in the PCI bridge memory window returned by the ACPI _CRS method,
>>>
>>> Please avoid the overly long lines in your comments.
>>
>> ?
>>
>> These lines are easily within the new 100 char line-length-limit ?
> 
> That is an exceptional limit if exceeding 80 improves readability for
> individual lines.

Ok, I'll send out a v4 adjusting the comment-block to fit in 80 columns.

Regards,

Hans
Bjorn Helgaas Oct. 14, 2021, 3:49 p.m. UTC | #10
On Thu, Oct 14, 2021 at 01:03:57PM +0200, Hans de Goede wrote:
> Some BIOS-es contain a bug where they add addresses which map to system RAM
> in the PCI bridge memory window returned by the ACPI _CRS method, see
> commit 4dc2287c1805 ("x86: avoid E820 regions when allocating address
> space").
> 
> To avoid this Linux by default excludes E820 reservations when allocating
> addresses since 2010. Windows however ignores E820 reserved regions for PCI
> mem allocations, so in hindsight Linux honoring them is a problem.

This is a problem not because Windows ignores E820, but because the
spec does not support excluding based on E820.  We should cite the
spec and include the relevant text here.

> Recently (2020) some systems have shown-up with E820 reservations which
> cover the entire _CRS returned PCI bridge memory window, causing all
> attempts to assign memory to PCI BARs which have not been setup by the
> BIOS to fail. For example here are the relevant dmesg bits from a
> Lenovo IdeaPad 3 15IIL 81WE:
> 
>  [mem 0x000000004bc50000-0x00000000cfffffff] reserved
>  pci_bus 0000:00: root bus resource [mem 0x65400000-0xbfffffff window]
> 
> Ideally Linux would fully stop honoring E820 reservations for PCI mem
> allocations, but then the old systems this was added for will regress.
> Instead keep the old behavior for old systems, while ignoring the E820
> reservations like Windows does for any systems from now on.
> 
> Old systems are defined here as BIOS year < 2018, this was chosen to
> make sure that pci_use_e820 will not be set on the currently affected
> systems, while at the same time also taking into account that the
> systems for which the E820 checking was originally added may have
> received BIOS updates for quite a while (esp. CVE related ones),
> giving them a more recent BIOS year then 2010.
> 
> Also add pci=no_e820 and pci=use_e820 options to allow overriding
> the BIOS year heuristic.
> 
> BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=206459
> BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1868899
> BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1871793
> BugLink: https://bugs.launchpad.net/bugs/1878279
> BugLink: https://bugs.launchpad.net/bugs/1931715
> BugLink: https://bugs.launchpad.net/bugs/1932069
> BugLink: https://bugs.launchpad.net/bugs/1921649
> Cc: Benoit Grégoire <benoitg@coeus.ca>
> Cc: Hui Wang <hui.wang@canonical.com>
> Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
> Signed-off-by: Hans de Goede <hdegoede@redhat.com>
> ---
> Changes in v3:
> - Commit msg tweaks (drop dmesg timestamps, typo fix)
> - Use "defined(CONFIG_...)" instead of "defined CONFIG_..."
> - Add Mika's Reviewed-by
> 
> Changes in v2:
> - Replace the per model DMI quirk approach with disabling E820 reservations
>   checking for all systems with a BIOS year >= 2018
> - Add documentation for the new kernel-parameters to
>   Documentation/admin-guide/kernel-parameters.txt
> ---
> Other patches trying to address the same issue:
> https://lore.kernel.org/r/20210624095324.34906-1-hui.wang@canonical.com
> https://lore.kernel.org/r/20200617164734.84845-1-mika.westerberg@linux.intel.com
> V1 patch:
> https://lore.kernel.org/r/20211005150956.303707-1-hdegoede@redhat.com
> ---
>  .../admin-guide/kernel-parameters.txt         |  6 ++++
>  arch/x86/include/asm/pci_x86.h                | 10 +++++++
>  arch/x86/kernel/resource.c                    |  4 +++
>  arch/x86/pci/acpi.c                           | 29 +++++++++++++++++++
>  arch/x86/pci/common.c                         |  6 ++++
>  5 files changed, 55 insertions(+)
> 
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index 43dc35fe5bc0..969cde5d74c8 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -3949,6 +3949,12 @@
>  				please report a bug.
>  		nocrs		[X86] Ignore PCI host bridge windows from ACPI.
>  				If you need to use this, please report a bug.
> +		use_e820	[X86] Honor E820 reservations when allocating
> +				PCI host bridge memory. If you need to use this,
> +				please report a bug.
> +		no_e820		[X86] ignore E820 reservations when allocating
> +				PCI host bridge memory. If you need to use this,
> +				please report a bug.

Inconsistent capitalization.

Should use same terminology as nocrs, i.e., "PCI host bridge windows",
not "PCI host bridge memory".

I don't think "Honor E820 reservations" is the right way to describe
this.  It's not a question of upholding something we *should* be
doing.  That would be a matter of working around a BIOS defect.
Maybe something like this:

  Use E820 reservations to exclude parts of PCI host bridge windows.
  This is a workaround for BIOS defects in host bridge _CRS methods.
  If you need to use this, please report a bug to
  linux-pci@vger.kernel.org so we can apply it automatically.

Not 100% sure we need "no_e820" since that should be the default.  But
I guess it's conceivable some system might need it.  I just hate
adding additional things for people to try and then spread the
resulting misinformation as a "fix" on random forums.

>  		routeirq	Do IRQ routing for all PCI devices.
>  				This is normally done in pci_enable_device(),
>  				so this option is a temporary workaround
> diff --git a/arch/x86/include/asm/pci_x86.h b/arch/x86/include/asm/pci_x86.h
> index 490411dba438..0bb4e7dd0ffc 100644
> --- a/arch/x86/include/asm/pci_x86.h
> +++ b/arch/x86/include/asm/pci_x86.h
> @@ -39,6 +39,8 @@ do {						\
>  #define PCI_ROOT_NO_CRS		0x100000
>  #define PCI_NOASSIGN_BARS	0x200000
>  #define PCI_BIG_ROOT_WINDOW	0x400000
> +#define PCI_USE_E820		0x800000
> +#define PCI_NO_E820		0x1000000
>  
>  extern unsigned int pci_probe;
>  extern unsigned long pirq_table_addr;
> @@ -64,6 +66,8 @@ void pcibios_scan_specific_bus(int busn);
>  
>  /* pci-irq.c */
>  
> +struct pci_dev;
> +
>  struct irq_info {
>  	u8 bus, devfn;			/* Bus, device and function */
>  	struct {
> @@ -232,3 +236,9 @@ static inline void mmio_config_writel(void __iomem *pos, u32 val)
>  # define x86_default_pci_init_irq	NULL
>  # define x86_default_pci_fixup_irqs	NULL
>  #endif
> +
> +#if defined(CONFIG_PCI) && defined(CONFIG_ACPI)
> +extern bool pci_use_e820;
> +#else
> +#define pci_use_e820 false
> +#endif
> diff --git a/arch/x86/kernel/resource.c b/arch/x86/kernel/resource.c
> index 9b9fb7882c20..e8dc9bc327bd 100644
> --- a/arch/x86/kernel/resource.c
> +++ b/arch/x86/kernel/resource.c
> @@ -1,6 +1,7 @@
>  // SPDX-License-Identifier: GPL-2.0
>  #include <linux/ioport.h>
>  #include <asm/e820/api.h>
> +#include <asm/pci_x86.h>
>  
>  static void resource_clip(struct resource *res, resource_size_t start,
>  			  resource_size_t end)
> @@ -28,6 +29,9 @@ static void remove_e820_regions(struct resource *avail)
>  	int i;
>  	struct e820_entry *entry;
>  
> +	if (!pci_use_e820)
> +		return;
> +
>  	for (i = 0; i < e820_table->nr_entries; i++) {
>  		entry = &e820_table->entries[i];
>  
> diff --git a/arch/x86/pci/acpi.c b/arch/x86/pci/acpi.c
> index 948656069cdd..6c2febe84b6f 100644
> --- a/arch/x86/pci/acpi.c
> +++ b/arch/x86/pci/acpi.c
> @@ -21,6 +21,8 @@ struct pci_root_info {
>  
>  static bool pci_use_crs = true;
>  static bool pci_ignore_seg = false;
> +/* Consumed in arch/x86/kernel/resource.c */
> +bool pci_use_e820 = false;
>  
>  static int __init set_use_crs(const struct dmi_system_id *id)
>  {
> @@ -160,6 +162,33 @@ void __init pci_acpi_crs_quirks(void)
>  	       "if necessary, use \"pci=%s\" and report a bug\n",
>  	       pci_use_crs ? "Using" : "Ignoring",
>  	       pci_use_crs ? "nocrs" : "use_crs");
> +
> +	/*
> +	 * Some BIOS-es contain a bug where they add addresses which map to system
> +	 * RAM in the PCI bridge memory window returned by the ACPI _CRS method, see
> +	 * commit 4dc2287c1805 ("x86: avoid E820 regions when allocating address space").
> +	 * To avoid this Linux by default excludes E820 reservations when allocating
> +	 * addresses since 2010. Windows however ignores E820 reserved regions for
> +	 * PCI mem allocations, so in hindsight Linux honoring them is a problem.
> +	 * In 2020 some systems have shown-up with E820 reservations which cover the
> +	 * entire _CRS returned PCI bridge memory window, causing all attempts to
> +	 * assign memory to PCI BARs to fail if Linux honors the E820 reservations.
> +	 *
> +	 * Ideally Linux would fully stop honoring E820 reservations for PCI mem
> +	 * allocations, but then the old systems this was added for will regress.
> +	 * Instead keep the old behavior for old systems, while ignoring the E820
> +	 * reservations like Windows does for any systems from now on.

Wrap this to fit in 80 columns like the rest of the file.  More
important for the file to be internally consistent than any new
guidelines.

> +	 */
> +	if (year >= 0 && year < 2018)
> +		pci_use_e820 = true;
> +
> +	if (pci_probe & PCI_NO_E820)
> +		pci_use_e820 = false;
> +	else if (pci_probe & PCI_USE_E820)
> +		pci_use_e820 = true;
> +
> +	printk(KERN_INFO "PCI: %s E820 reservations for host bridge windows\n",
> +	       pci_use_e820 ? "Honoring" : "Ignoring");
>  }
>  
>  #ifdef	CONFIG_PCI_MMCONFIG
> diff --git a/arch/x86/pci/common.c b/arch/x86/pci/common.c
> index 3507f456fcd0..091ec7e94fcb 100644
> --- a/arch/x86/pci/common.c
> +++ b/arch/x86/pci/common.c
> @@ -595,6 +595,12 @@ char *__init pcibios_setup(char *str)
>  	} else if (!strcmp(str, "nocrs")) {
>  		pci_probe |= PCI_ROOT_NO_CRS;
>  		return NULL;
> +	} else if (!strcmp(str, "use_e820")) {
> +		pci_probe |= PCI_USE_E820;
> +		return NULL;
> +	} else if (!strcmp(str, "no_e820")) {
> +		pci_probe |= PCI_NO_E820;
> +		return NULL;

Why does pci_probe have to be involved here?  Couldn't we just set
pci_use_e820 directly?

Same argument applies to PCI_USE__CRS and PCI_ROOT_NO_CRS of course,
and I probably added those, so maybe there was a reason, or maybe I
just screwed up that too.

>  #ifdef CONFIG_PHYS_ADDR_T_64BIT
>  	} else if (!strcmp(str, "big_root_window")) {
>  		pci_probe |= PCI_BIG_ROOT_WINDOW;
> -- 
> 2.31.1
>
Hans de Goede Oct. 14, 2021, 5:06 p.m. UTC | #11
Hi Bjorn,

Thank you for the review.

On 10/14/21 5:49 PM, Bjorn Helgaas wrote:
> On Thu, Oct 14, 2021 at 01:03:57PM +0200, Hans de Goede wrote:
>> Some BIOS-es contain a bug where they add addresses which map to system RAM
>> in the PCI bridge memory window returned by the ACPI _CRS method, see
>> commit 4dc2287c1805 ("x86: avoid E820 regions when allocating address
>> space").
>>
>> To avoid this Linux by default excludes E820 reservations when allocating
>> addresses since 2010. Windows however ignores E820 reserved regions for PCI
>> mem allocations, so in hindsight Linux honoring them is a problem.
> 
> This is a problem not because Windows ignores E820, but because the
> spec does not support excluding based on E820.  We should cite the
> spec and include the relevant text here.

Ok, for v5 I will replace this with your info on the specs from this email:
https://lore.kernel.org/linux-pci/20211008110149.GA1313872@bhelgaas/T/#t

> 
>> Recently (2020) some systems have shown-up with E820 reservations which
>> cover the entire _CRS returned PCI bridge memory window, causing all
>> attempts to assign memory to PCI BARs which have not been setup by the
>> BIOS to fail. For example here are the relevant dmesg bits from a
>> Lenovo IdeaPad 3 15IIL 81WE:
>>
>>  [mem 0x000000004bc50000-0x00000000cfffffff] reserved
>>  pci_bus 0000:00: root bus resource [mem 0x65400000-0xbfffffff window]
>>
>> Ideally Linux would fully stop honoring E820 reservations for PCI mem
>> allocations, but then the old systems this was added for will regress.
>> Instead keep the old behavior for old systems, while ignoring the E820
>> reservations like Windows does for any systems from now on.
>>
>> Old systems are defined here as BIOS year < 2018, this was chosen to
>> make sure that pci_use_e820 will not be set on the currently affected
>> systems, while at the same time also taking into account that the
>> systems for which the E820 checking was originally added may have
>> received BIOS updates for quite a while (esp. CVE related ones),
>> giving them a more recent BIOS year then 2010.
>>
>> Also add pci=no_e820 and pci=use_e820 options to allow overriding
>> the BIOS year heuristic.
>>
>> BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=206459
>> BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1868899
>> BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1871793
>> BugLink: https://bugs.launchpad.net/bugs/1878279
>> BugLink: https://bugs.launchpad.net/bugs/1931715
>> BugLink: https://bugs.launchpad.net/bugs/1932069
>> BugLink: https://bugs.launchpad.net/bugs/1921649
>> Cc: Benoit Grégoire <benoitg@coeus.ca>
>> Cc: Hui Wang <hui.wang@canonical.com>
>> Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
>> Signed-off-by: Hans de Goede <hdegoede@redhat.com>
>> ---
>> Changes in v3:
>> - Commit msg tweaks (drop dmesg timestamps, typo fix)
>> - Use "defined(CONFIG_...)" instead of "defined CONFIG_..."
>> - Add Mika's Reviewed-by
>>
>> Changes in v2:
>> - Replace the per model DMI quirk approach with disabling E820 reservations
>>   checking for all systems with a BIOS year >= 2018
>> - Add documentation for the new kernel-parameters to
>>   Documentation/admin-guide/kernel-parameters.txt
>> ---
>> Other patches trying to address the same issue:
>> https://lore.kernel.org/r/20210624095324.34906-1-hui.wang@canonical.com
>> https://lore.kernel.org/r/20200617164734.84845-1-mika.westerberg@linux.intel.com
>> V1 patch:
>> https://lore.kernel.org/r/20211005150956.303707-1-hdegoede@redhat.com
>> ---
>>  .../admin-guide/kernel-parameters.txt         |  6 ++++
>>  arch/x86/include/asm/pci_x86.h                | 10 +++++++
>>  arch/x86/kernel/resource.c                    |  4 +++
>>  arch/x86/pci/acpi.c                           | 29 +++++++++++++++++++
>>  arch/x86/pci/common.c                         |  6 ++++
>>  5 files changed, 55 insertions(+)
>>
>> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
>> index 43dc35fe5bc0..969cde5d74c8 100644
>> --- a/Documentation/admin-guide/kernel-parameters.txt
>> +++ b/Documentation/admin-guide/kernel-parameters.txt
>> @@ -3949,6 +3949,12 @@
>>  				please report a bug.
>>  		nocrs		[X86] Ignore PCI host bridge windows from ACPI.
>>  				If you need to use this, please report a bug.
>> +		use_e820	[X86] Honor E820 reservations when allocating
>> +				PCI host bridge memory. If you need to use this,
>> +				please report a bug.
>> +		no_e820		[X86] ignore E820 reservations when allocating
>> +				PCI host bridge memory. If you need to use this,
>> +				please report a bug.
> 
> Inconsistent capitalization.
> 
> Should use same terminology as nocrs, i.e., "PCI host bridge windows",
> not "PCI host bridge memory".
> 
> I don't think "Honor E820 reservations" is the right way to describe
> this.  It's not a question of upholding something we *should* be
> doing.  That would be a matter of working around a BIOS defect.
> Maybe something like this:
> 
>   Use E820 reservations to exclude parts of PCI host bridge windows.
>   This is a workaround for BIOS defects in host bridge _CRS methods.
>   If you need to use this, please report a bug to
>   linux-pci@vger.kernel.org so we can apply it automatically.

Ok, that works for me, I'll use the above text for v5.


> Not 100% sure we need "no_e820" since that should be the default.  But
> I guess it's conceivable some system might need it.  I just hate
> adding additional things for people to try and then spread the
> resulting misinformation as a "fix" on random forums.

no_e820 will be the default for BIOS-year >= 2018, but on older
systems my patch preserves the old behavior of use_e820.

So to allow people to easily test if no_e820 behavior helps on
older systems we need it.

I fully agree with your worry about 'misinformation as a "fix"
on random forums', but I don't have a solution for that.

I guess we could completely omit both cmdline options, but
that will make debugging issues a lot header. IMHO it is
worthwhile to have this options to allow users to quickly
test things when we ask them to do so.

> 
>>  		routeirq	Do IRQ routing for all PCI devices.
>>  				This is normally done in pci_enable_device(),
>>  				so this option is a temporary workaround
>> diff --git a/arch/x86/include/asm/pci_x86.h b/arch/x86/include/asm/pci_x86.h
>> index 490411dba438..0bb4e7dd0ffc 100644
>> --- a/arch/x86/include/asm/pci_x86.h
>> +++ b/arch/x86/include/asm/pci_x86.h
>> @@ -39,6 +39,8 @@ do {						\
>>  #define PCI_ROOT_NO_CRS		0x100000
>>  #define PCI_NOASSIGN_BARS	0x200000
>>  #define PCI_BIG_ROOT_WINDOW	0x400000
>> +#define PCI_USE_E820		0x800000
>> +#define PCI_NO_E820		0x1000000
>>  
>>  extern unsigned int pci_probe;
>>  extern unsigned long pirq_table_addr;
>> @@ -64,6 +66,8 @@ void pcibios_scan_specific_bus(int busn);
>>  
>>  /* pci-irq.c */
>>  
>> +struct pci_dev;
>> +
>>  struct irq_info {
>>  	u8 bus, devfn;			/* Bus, device and function */
>>  	struct {
>> @@ -232,3 +236,9 @@ static inline void mmio_config_writel(void __iomem *pos, u32 val)
>>  # define x86_default_pci_init_irq	NULL
>>  # define x86_default_pci_fixup_irqs	NULL
>>  #endif
>> +
>> +#if defined(CONFIG_PCI) && defined(CONFIG_ACPI)
>> +extern bool pci_use_e820;
>> +#else
>> +#define pci_use_e820 false
>> +#endif
>> diff --git a/arch/x86/kernel/resource.c b/arch/x86/kernel/resource.c
>> index 9b9fb7882c20..e8dc9bc327bd 100644
>> --- a/arch/x86/kernel/resource.c
>> +++ b/arch/x86/kernel/resource.c
>> @@ -1,6 +1,7 @@
>>  // SPDX-License-Identifier: GPL-2.0
>>  #include <linux/ioport.h>
>>  #include <asm/e820/api.h>
>> +#include <asm/pci_x86.h>
>>  
>>  static void resource_clip(struct resource *res, resource_size_t start,
>>  			  resource_size_t end)
>> @@ -28,6 +29,9 @@ static void remove_e820_regions(struct resource *avail)
>>  	int i;
>>  	struct e820_entry *entry;
>>  
>> +	if (!pci_use_e820)
>> +		return;
>> +
>>  	for (i = 0; i < e820_table->nr_entries; i++) {
>>  		entry = &e820_table->entries[i];
>>  
>> diff --git a/arch/x86/pci/acpi.c b/arch/x86/pci/acpi.c
>> index 948656069cdd..6c2febe84b6f 100644
>> --- a/arch/x86/pci/acpi.c
>> +++ b/arch/x86/pci/acpi.c
>> @@ -21,6 +21,8 @@ struct pci_root_info {
>>  
>>  static bool pci_use_crs = true;
>>  static bool pci_ignore_seg = false;
>> +/* Consumed in arch/x86/kernel/resource.c */
>> +bool pci_use_e820 = false;
>>  
>>  static int __init set_use_crs(const struct dmi_system_id *id)
>>  {
>> @@ -160,6 +162,33 @@ void __init pci_acpi_crs_quirks(void)
>>  	       "if necessary, use \"pci=%s\" and report a bug\n",
>>  	       pci_use_crs ? "Using" : "Ignoring",
>>  	       pci_use_crs ? "nocrs" : "use_crs");
>> +
>> +	/*
>> +	 * Some BIOS-es contain a bug where they add addresses which map to system
>> +	 * RAM in the PCI bridge memory window returned by the ACPI _CRS method, see
>> +	 * commit 4dc2287c1805 ("x86: avoid E820 regions when allocating address space").
>> +	 * To avoid this Linux by default excludes E820 reservations when allocating
>> +	 * addresses since 2010. Windows however ignores E820 reserved regions for
>> +	 * PCI mem allocations, so in hindsight Linux honoring them is a problem.
>> +	 * In 2020 some systems have shown-up with E820 reservations which cover the
>> +	 * entire _CRS returned PCI bridge memory window, causing all attempts to
>> +	 * assign memory to PCI BARs to fail if Linux honors the E820 reservations.
>> +	 *
>> +	 * Ideally Linux would fully stop honoring E820 reservations for PCI mem
>> +	 * allocations, but then the old systems this was added for will regress.
>> +	 * Instead keep the old behavior for old systems, while ignoring the E820
>> +	 * reservations like Windows does for any systems from now on.
> 
> Wrap this to fit in 80 columns like the rest of the file.  More
> important for the file to be internally consistent than any new
> guidelines.

Already fixed in v4 (the only change in v4). I'll also drop the "Windows..."
sentence which you did not like in the commit msg for v5.

> 
>> +	 */
>> +	if (year >= 0 && year < 2018)
>> +		pci_use_e820 = true;
>> +
>> +	if (pci_probe & PCI_NO_E820)
>> +		pci_use_e820 = false;
>> +	else if (pci_probe & PCI_USE_E820)
>> +		pci_use_e820 = true;
>> +
>> +	printk(KERN_INFO "PCI: %s E820 reservations for host bridge windows\n",
>> +	       pci_use_e820 ? "Honoring" : "Ignoring");
>>  }
>>  
>>  #ifdef	CONFIG_PCI_MMCONFIG
>> diff --git a/arch/x86/pci/common.c b/arch/x86/pci/common.c
>> index 3507f456fcd0..091ec7e94fcb 100644
>> --- a/arch/x86/pci/common.c
>> +++ b/arch/x86/pci/common.c
>> @@ -595,6 +595,12 @@ char *__init pcibios_setup(char *str)
>>  	} else if (!strcmp(str, "nocrs")) {
>>  		pci_probe |= PCI_ROOT_NO_CRS;
>>  		return NULL;
>> +	} else if (!strcmp(str, "use_e820")) {
>> +		pci_probe |= PCI_USE_E820;
>> +		return NULL;
>> +	} else if (!strcmp(str, "no_e820")) {
>> +		pci_probe |= PCI_NO_E820;
>> +		return NULL;
> 
> Why does pci_probe have to be involved here?  Couldn't we just set
> pci_use_e820 directly?

This code runs early on, before the heuristics which set
pci_use_e820 gets set based on the BIOS year, so this is
done indirectly so that the code doing the heuristics
(and checking DMI quirks/exceptions if we add any)
can set a default and then these flags are checked to
override the default.

> Same argument applies to PCI_USE__CRS and PCI_ROOT_NO_CRS of course,
> and I probably added those, so maybe there was a reason, or maybe I
> just screwed up that too.

What I wrote above, also applies to the use_crs handling, which is
handled the same for the same reason :)

Regards,

Hans
diff mbox series

Patch

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 43dc35fe5bc0..969cde5d74c8 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -3949,6 +3949,12 @@ 
 				please report a bug.
 		nocrs		[X86] Ignore PCI host bridge windows from ACPI.
 				If you need to use this, please report a bug.
+		use_e820	[X86] Honor E820 reservations when allocating
+				PCI host bridge memory. If you need to use this,
+				please report a bug.
+		no_e820		[X86] ignore E820 reservations when allocating
+				PCI host bridge memory. If you need to use this,
+				please report a bug.
 		routeirq	Do IRQ routing for all PCI devices.
 				This is normally done in pci_enable_device(),
 				so this option is a temporary workaround
diff --git a/arch/x86/include/asm/pci_x86.h b/arch/x86/include/asm/pci_x86.h
index 490411dba438..0bb4e7dd0ffc 100644
--- a/arch/x86/include/asm/pci_x86.h
+++ b/arch/x86/include/asm/pci_x86.h
@@ -39,6 +39,8 @@  do {						\
 #define PCI_ROOT_NO_CRS		0x100000
 #define PCI_NOASSIGN_BARS	0x200000
 #define PCI_BIG_ROOT_WINDOW	0x400000
+#define PCI_USE_E820		0x800000
+#define PCI_NO_E820		0x1000000
 
 extern unsigned int pci_probe;
 extern unsigned long pirq_table_addr;
@@ -64,6 +66,8 @@  void pcibios_scan_specific_bus(int busn);
 
 /* pci-irq.c */
 
+struct pci_dev;
+
 struct irq_info {
 	u8 bus, devfn;			/* Bus, device and function */
 	struct {
@@ -232,3 +236,9 @@  static inline void mmio_config_writel(void __iomem *pos, u32 val)
 # define x86_default_pci_init_irq	NULL
 # define x86_default_pci_fixup_irqs	NULL
 #endif
+
+#if defined(CONFIG_PCI) && defined(CONFIG_ACPI)
+extern bool pci_use_e820;
+#else
+#define pci_use_e820 false
+#endif
diff --git a/arch/x86/kernel/resource.c b/arch/x86/kernel/resource.c
index 9b9fb7882c20..e8dc9bc327bd 100644
--- a/arch/x86/kernel/resource.c
+++ b/arch/x86/kernel/resource.c
@@ -1,6 +1,7 @@ 
 // SPDX-License-Identifier: GPL-2.0
 #include <linux/ioport.h>
 #include <asm/e820/api.h>
+#include <asm/pci_x86.h>
 
 static void resource_clip(struct resource *res, resource_size_t start,
 			  resource_size_t end)
@@ -28,6 +29,9 @@  static void remove_e820_regions(struct resource *avail)
 	int i;
 	struct e820_entry *entry;
 
+	if (!pci_use_e820)
+		return;
+
 	for (i = 0; i < e820_table->nr_entries; i++) {
 		entry = &e820_table->entries[i];
 
diff --git a/arch/x86/pci/acpi.c b/arch/x86/pci/acpi.c
index 948656069cdd..6c2febe84b6f 100644
--- a/arch/x86/pci/acpi.c
+++ b/arch/x86/pci/acpi.c
@@ -21,6 +21,8 @@  struct pci_root_info {
 
 static bool pci_use_crs = true;
 static bool pci_ignore_seg = false;
+/* Consumed in arch/x86/kernel/resource.c */
+bool pci_use_e820 = false;
 
 static int __init set_use_crs(const struct dmi_system_id *id)
 {
@@ -160,6 +162,33 @@  void __init pci_acpi_crs_quirks(void)
 	       "if necessary, use \"pci=%s\" and report a bug\n",
 	       pci_use_crs ? "Using" : "Ignoring",
 	       pci_use_crs ? "nocrs" : "use_crs");
+
+	/*
+	 * Some BIOS-es contain a bug where they add addresses which map to system
+	 * RAM in the PCI bridge memory window returned by the ACPI _CRS method, see
+	 * commit 4dc2287c1805 ("x86: avoid E820 regions when allocating address space").
+	 * To avoid this Linux by default excludes E820 reservations when allocating
+	 * addresses since 2010. Windows however ignores E820 reserved regions for
+	 * PCI mem allocations, so in hindsight Linux honoring them is a problem.
+	 * In 2020 some systems have shown-up with E820 reservations which cover the
+	 * entire _CRS returned PCI bridge memory window, causing all attempts to
+	 * assign memory to PCI BARs to fail if Linux honors the E820 reservations.
+	 *
+	 * Ideally Linux would fully stop honoring E820 reservations for PCI mem
+	 * allocations, but then the old systems this was added for will regress.
+	 * Instead keep the old behavior for old systems, while ignoring the E820
+	 * reservations like Windows does for any systems from now on.
+	 */
+	if (year >= 0 && year < 2018)
+		pci_use_e820 = true;
+
+	if (pci_probe & PCI_NO_E820)
+		pci_use_e820 = false;
+	else if (pci_probe & PCI_USE_E820)
+		pci_use_e820 = true;
+
+	printk(KERN_INFO "PCI: %s E820 reservations for host bridge windows\n",
+	       pci_use_e820 ? "Honoring" : "Ignoring");
 }
 
 #ifdef	CONFIG_PCI_MMCONFIG
diff --git a/arch/x86/pci/common.c b/arch/x86/pci/common.c
index 3507f456fcd0..091ec7e94fcb 100644
--- a/arch/x86/pci/common.c
+++ b/arch/x86/pci/common.c
@@ -595,6 +595,12 @@  char *__init pcibios_setup(char *str)
 	} else if (!strcmp(str, "nocrs")) {
 		pci_probe |= PCI_ROOT_NO_CRS;
 		return NULL;
+	} else if (!strcmp(str, "use_e820")) {
+		pci_probe |= PCI_USE_E820;
+		return NULL;
+	} else if (!strcmp(str, "no_e820")) {
+		pci_probe |= PCI_NO_E820;
+		return NULL;
 #ifdef CONFIG_PHYS_ADDR_T_64BIT
 	} else if (!strcmp(str, "big_root_window")) {
 		pci_probe |= PCI_BIG_ROOT_WINDOW;