Message ID | 20100901213318.19353.54619.stgit@localhost.localdomain |
---|---|
State | Not Applicable, archived |
Delegated to: | David Miller |
Headers | show |
On Wed, Sep 01, 2010 at 02:33:18PM -0700, Peter P Waskiewicz Jr wrote: > On certain BIOSes, SRAT enumeration isn't exported correctly. > This leads to NUMA node enumeration failure, and causes the kernel > to fall back onto a single node treated as flat memory. This > can happen on large, multi-socket systems (4 or more sockets), and > becomes problematic for performance. > > This patch adds a boot parameter to allow a kernel to be booted > with the option to skip the SRAT check. There are BIOSes in > production that have these failures, so this will allow people > in the field to work around these BIOS issues. Looks good. Reviewed-by: Andi Kleen <ak@linux.intel.com> -Andi -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, September 1, 2010 2:33 pm, Peter P Waskiewicz Jr wrote: > Documentation/x86/x86_64/boot-options.txt | 4 ++++ > arch/x86/mm/srat_64.c | 20 +++++++++++++++++--- 2 > files changed, 21 insertions(+), 3 deletions(-) > > diff --git a/arch/x86/mm/srat_64.c b/arch/x86/mm/srat_64.c index > f9897f7..8719472 100644 --- a/arch/x86/mm/srat_64.c > +++ b/arch/x86/mm/srat_64.c > @@ -351,6 +351,15 @@ int __init acpi_get_nodes(struct bootnode *physnodes) > return ret; } > > > +int srat_bypass_bios = 0; static int srat_bypass_bios; Don't need to init to 0. > + > +static int __init srat_bypass_bios_setup(char *str) > +{ > + srat_bypass_bios = 1; > + return 0; > +} > +early_param("sratbypassbios", srat_bypass_bios_setup); > + > /* Use the information discovered above to actually set up the nodes. */ > int __init acpi_scan_nodes(unsigned long start, unsigned long end) { > @@ -425,9 +434,14 @@ int __init acpi_scan_nodes(unsigned long start, > unsigned long end) nodes[i].end >> PAGE_SHIFT); /* for out of order entries > in SRAT */ sort_node_map(); - if (!nodes_cover_memory(nodes)) { > - bad_srat(); > - return -1; > + if (!srat_bypass_bios) { > + if (!nodes_cover_memory(nodes)) { > + bad_srat(); > + return -1; > + } > + } else { > + printk(KERN_INFO > + "SRAT: Bypassing NUMA sanity check...bad BIOS...\n"); > } -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 2010-09-01 at 15:04 -0700, rdunlap@xenotime.net wrote: > On Wed, September 1, 2010 2:33 pm, Peter P Waskiewicz Jr wrote: > > Documentation/x86/x86_64/boot-options.txt | 4 ++++ > > arch/x86/mm/srat_64.c | 20 +++++++++++++++++--- 2 > > files changed, 21 insertions(+), 3 deletions(-) > > > > diff --git a/arch/x86/mm/srat_64.c b/arch/x86/mm/srat_64.c index > > f9897f7..8719472 100644 --- a/arch/x86/mm/srat_64.c > > +++ b/arch/x86/mm/srat_64.c > > @@ -351,6 +351,15 @@ int __init acpi_get_nodes(struct bootnode *physnodes) > > return ret; } > > > > > > +int srat_bypass_bios = 0; > > static int srat_bypass_bios; > > Don't need to init to 0. I'll fix and send a second version. Thanks Randy. -PJ -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
* Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com> wrote: > On certain BIOSes, SRAT enumeration isn't exported correctly. This > leads to NUMA node enumeration failure, and causes the kernel to fall > back onto a single node treated as flat memory. This can happen on > large, multi-socket systems (4 or more sockets), and becomes > problematic for performance. > > This patch adds a boot parameter to allow a kernel to be booted with > the option to skip the SRAT check. There are BIOSes in production > that have these failures, so this will allow people in the field to > work around these BIOS issues. > + sratbypassbios > + If specified, will skip an SRAT check for PXM coverage > + from BIOS enumeration. Only to be used on systems with > + buggy BIOSes that munge the SRAT enumeration. This isnt a particularly useful solution to users of said systems - they have to figure out that this option exists, and then they have to enter this option on the boot line. A better solution would be to match these systems using DMI filters - _and_ to also have the boot time option to cover the case where a new system comes out with such a breakage. (or there's some system not yet mapped) Ingo -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> This isnt a particularly useful solution to users of said systems - they > have to figure out that this option exists, and then they have to enter > this option on the boot line. This usually only happens in early preproduction systems. So far the BIOS always got fixed before they shipped to users. But it's still useful to have this option so that the early users have a workaround. It's difficult to maintain DMI strings for early systems, they are too bleeding edge and the entries would be stale as soon as the BIOS is fixed. -Andi
* Andi Kleen <andi@firstfloor.org> wrote: > > This isnt a particularly useful solution to users of said systems - > > they have to figure out that this option exists, and then they have > > to enter this option on the boot line. > > This usually only happens in early preproduction systems. So far the > BIOS always got fixed before they shipped to users. 'Usually' != 'always'. Read the changelog: ' There are BIOSes in production that have these failures, so this will allow people in the field to work around these BIOS issues. ' Peter, which system in production that has this problem? That one needs a DMI match. Preproduction systems can certainly be hacked around with the boot option and are not worth the DMI match. (preproduction systems also tend to have have very unspecific DMI strings in most cases - things like '1234567890' or 'to be filled by OEM') In important cases a PCI ID match can be done for serious bugs in widespread preproduction systems - we have a handful of examples in-tree for such workarounds - but i doubt this is such a case - we use it for crasher bugs, not for 'boots up slow' cases. Nevertheless, production systems need a DMI match. Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> > +int srat_bypass_bios = 0; > > static int srat_bypass_bios; > > Don't need to init to 0. It actually doesn't matter to gcc, it generates the same code for both cases. -Andi -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, 2010-09-02 at 23:39 -0700, Ingo Molnar wrote: > * Andi Kleen <andi@firstfloor.org> wrote: > > > > This isnt a particularly useful solution to users of said systems - > > > they have to figure out that this option exists, and then they have > > > to enter this option on the boot line. > > > > This usually only happens in early preproduction systems. So far the > > BIOS always got fixed before they shipped to users. > > 'Usually' != 'always'. Read the changelog: > > ' There are BIOSes in production that have these failures, so this will > allow people in the field to work around these BIOS issues. ' > > Peter, which system in production that has this problem? That one needs > a DMI match. It's one SKU of a Nehalem-EX system. The BIOS for that SKU has an issue with resolving SRAT hotplug enumeration, and screws up the table. Other SKU's of this same platform do not have the issue. Efforts are underway to get this BIOS fixed, but in the meantime, there's nothing for users to work around the bug (aside from disabling memory hotplug in the BIOS). Another platform almost shipped with the same symptoms, but caught it and had it fixed before it shipped (didn't catch it early because Windows wasn't failing, and most of the testing on that platform was done under Windows). I agree with Andi that adding DMI strings would be overkill and would leave clutter once the BIOS is fixed. I look at this patch as a stop-gap measure for people to fall back on until a newer BIOS is available to correct the NUMA enumeration issues. Without it, we have nothing to point users to when they run into this, waiting for a new BIOS. Cheers, -PJ -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
* Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com> wrote: > On Thu, 2010-09-02 at 23:39 -0700, Ingo Molnar wrote: > > * Andi Kleen <andi@firstfloor.org> wrote: > > > > > > This isnt a particularly useful solution to users of said systems - > > > > they have to figure out that this option exists, and then they have > > > > to enter this option on the boot line. > > > > > > This usually only happens in early preproduction systems. So far the > > > BIOS always got fixed before they shipped to users. > > > > 'Usually' != 'always'. Read the changelog: > > > > ' There are BIOSes in production that have these failures, so this will > > allow people in the field to work around these BIOS issues. ' > > > > Peter, which system in production that has this problem? That one needs > > a DMI match. > > It's one SKU of a Nehalem-EX system. The BIOS for that SKU has an > issue with resolving SRAT hotplug enumeration, and screws up the > table. Other SKU's of this same platform do not have the issue. > Efforts are underway to get this BIOS fixed, but in the meantime, > there's nothing for users to work around the bug (aside from disabling > memory hotplug in the BIOS). Another platform almost shipped with the > same symptoms, but caught it and had it fixed before it shipped > (didn't catch it early because Windows wasn't failing, and most of the > testing on that platform was done under Windows). > > I agree with Andi that adding DMI strings would be overkill and would > leave clutter once the BIOS is fixed. [...] We use the following policy for hardware/firmware workarounds in upstream arch/x86: if the system got shipped and if the vendor/OEM wants it fixed, then it has real DMI info (or some PCI ID match method, etc.) and an automatic workaround is very well possible and desirable. If the vendor cannot be bothered to add a few lines based on a simple reading of dmidecode output and test it, then we dont really want/need the rest of the patch upstream either. It should be literally 5 minutes of work to add a DMI match. > I look at this patch as a stop-gap measure for people to fall back on > until a newer BIOS is available to correct the NUMA enumeration > issues. [...] We dont do half-done stop-gap measures in the upstream kernel like that, and for various good reasons. Furthermore, since Windows doesnt have a problem booting with this, i'm afraid that we are bound to see repeat problems of this sort, so we better have the DMI path beaten out - even if in this case it's a single model. > [...] Without it, we have nothing to point users to when they run > into this, waiting for a new BIOS. I by all means support you to give users a real fix - one that applies the workaround automatically with a DMI match. Also, as i said, we can also add the boot option in the same patch. Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 09/07/2010 12:38 PM, Peter P Waskiewicz Jr wrote: > > It's one SKU of a Nehalem-EX system. The BIOS for that SKU has an issue > with resolving SRAT hotplug enumeration, and screws up the table. Other > SKU's of this same platform do not have the issue. Efforts are underway > to get this BIOS fixed, but in the meantime, there's nothing for users > to work around the bug (aside from disabling memory hotplug in the > BIOS). Another platform almost shipped with the same symptoms, but > caught it and had it fixed before it shipped (didn't catch it early > because Windows wasn't failing, and most of the testing on that platform > was done under Windows). > > I agree with Andi that adding DMI strings would be overkill and would > leave clutter once the BIOS is fixed. I look at this patch as a > stop-gap measure for people to fall back on until a newer BIOS is > available to correct the NUMA enumeration issues. Without it, we have > nothing to point users to when they run into this, waiting for a new > BIOS. > No, this is exactly the kind of stuff for which a DMI match is ideal. A specific system with bounded propagation of the problem. Thus, the DMI match acts as a whitelist -- "we know this system and it is safe to activate this hack on it." This is a very good thing. If this is a production BIOS it should have this information. -hpa -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, 2010-09-07 at 13:03 -0700, H. Peter Anvin wrote: > On 09/07/2010 12:38 PM, Peter P Waskiewicz Jr wrote: > > > > It's one SKU of a Nehalem-EX system. The BIOS for that SKU has an issue > > with resolving SRAT hotplug enumeration, and screws up the table. Other > > SKU's of this same platform do not have the issue. Efforts are underway > > to get this BIOS fixed, but in the meantime, there's nothing for users > > to work around the bug (aside from disabling memory hotplug in the > > BIOS). Another platform almost shipped with the same symptoms, but > > caught it and had it fixed before it shipped (didn't catch it early > > because Windows wasn't failing, and most of the testing on that platform > > was done under Windows). > > > > I agree with Andi that adding DMI strings would be overkill and would > > leave clutter once the BIOS is fixed. I look at this patch as a > > stop-gap measure for people to fall back on until a newer BIOS is > > available to correct the NUMA enumeration issues. Without it, we have > > nothing to point users to when they run into this, waiting for a new > > BIOS. > > > > No, this is exactly the kind of stuff for which a DMI match is ideal. A > specific system with bounded propagation of the problem. Thus, the DMI > match acts as a whitelist -- "we know this system and it is safe to > activate this hack on it." This is a very good thing. > > If this is a production BIOS it should have this information. Responding for both your and Ingo's last email, I'll work on getting a DMI match for this system. Thanks, -PJ
On 09/07/2010 01:16 PM, Peter P Waskiewicz Jr wrote: >> >> If this is a production BIOS it should have this information. > > Responding for both your and Ingo's last email, I'll work on getting a > DMI match for this system. > Great. -hpa -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
"H. Peter Anvin" <hpa@zytor.com> writes: > > If this is a production BIOS it should have this information. The point was -- it wasn't a production BIOS and the production BIOS will be fixed. If you want you call fill the DMI tables with all bugs in preproduction BIOS, but I suspect that will fill them up rather quickly. Preproduction BIOS tend to be buggy (that is why they are pre-production) Another problem also with adding stuff for early BIOS is that the later BIOS likely have this fixed and then you have a quirk that runs but is not needed (unless you add another DMI match for the fixed production BIOS etc.) While for this case it would be probably harmless, this has caused problems in the past with other quirks. Normally early users are also ok with having to specify an option as a workaround, they just need something they can specify. IMHO also it's good coding practice to have a command line option for every DMI quirk anyways. This makes it easier to let users test if they need a particular quirk and helps with users running older kernels. -Andi
On 09/07/2010 11:55 PM, Andi Kleen wrote: > "H. Peter Anvin" <hpa@zytor.com> writes: >> >> If this is a production BIOS it should have this information. > > The point was -- it wasn't a production BIOS and the production > BIOS will be fixed. > From the original description: This patch adds a boot parameter to allow a kernel to be booted with the option to skip the SRAT check. There are BIOSes in ^^^^^^^^^^^^^^^^^^^ production that have these failures, so this will allow people ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ in the field to work around these BIOS issues. Agreed that command-line option is a good thing to have in general and that it is not necessary to have DMI quirks for preproduction BIOSes. -hpa
On Wed, 2010-09-08 at 11:03 -0700, H. Peter Anvin wrote: > On 09/07/2010 11:55 PM, Andi Kleen wrote: > > "H. Peter Anvin" <hpa@zytor.com> writes: > >> > >> If this is a production BIOS it should have this information. > > > > The point was -- it wasn't a production BIOS and the production > > BIOS will be fixed. > > > > From the original description: > > This patch adds a boot parameter to allow a kernel to be booted > with the option to skip the SRAT check. There are BIOSes in > ^^^^^^^^^^^^^^^^^^^ > production that have these failures, so this will allow people > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > in the field to work around these BIOS issues. > > > Agreed that command-line option is a good thing to have in general and > that it is not necessary to have DMI quirks for preproduction BIOSes. Right. The new patchset (when I finish it) will have the DMI quirk for this specific production NHM-EX, plus the command-line option. -PJ
On 09/01/2010 03:11 PM, Peter P Waskiewicz Jr wrote: > On Wed, 2010-09-01 at 15:04 -0700, rdunlap@xenotime.net wrote: >> On Wed, September 1, 2010 2:33 pm, Peter P Waskiewicz Jr wrote: >>> Documentation/x86/x86_64/boot-options.txt | 4 ++++ >>> arch/x86/mm/srat_64.c | 20 +++++++++++++++++--- 2 >>> files changed, 21 insertions(+), 3 deletions(-) >>> >>> diff --git a/arch/x86/mm/srat_64.c b/arch/x86/mm/srat_64.c index >>> f9897f7..8719472 100644 --- a/arch/x86/mm/srat_64.c >>> +++ b/arch/x86/mm/srat_64.c >>> @@ -351,6 +351,15 @@ int __init acpi_get_nodes(struct bootnode *physnodes) >>> return ret; } >>> >>> >>> +int srat_bypass_bios = 0; >> >> static int srat_bypass_bios; >> >> Don't need to init to 0. > > I'll fix and send a second version. Thanks Randy. > Should be static *bool* srat_bypass_bios; Please use bool for actual booleans; it prevents someone from being "smart" and using values like 2 or -1 for "special cases". -hpa -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/Documentation/x86/x86_64/boot-options.txt b/Documentation/x86/x86_64/boot-options.txt index 7fbbaf8..7863d9c 100644 --- a/Documentation/x86/x86_64/boot-options.txt +++ b/Documentation/x86/x86_64/boot-options.txt @@ -316,3 +316,7 @@ Miscellaneous Do not use GB pages for kernel direct mappings. gbpages Use GB pages for kernel direct mappings. + sratbypassbios + If specified, will skip an SRAT check for PXM coverage + from BIOS enumeration. Only to be used on systems with + buggy BIOSes that munge the SRAT enumeration. diff --git a/arch/x86/mm/srat_64.c b/arch/x86/mm/srat_64.c index f9897f7..8719472 100644 --- a/arch/x86/mm/srat_64.c +++ b/arch/x86/mm/srat_64.c @@ -351,6 +351,15 @@ int __init acpi_get_nodes(struct bootnode *physnodes) return ret; } +int srat_bypass_bios = 0; + +static int __init srat_bypass_bios_setup(char *str) +{ + srat_bypass_bios = 1; + return 0; +} +early_param("sratbypassbios", srat_bypass_bios_setup); + /* Use the information discovered above to actually set up the nodes. */ int __init acpi_scan_nodes(unsigned long start, unsigned long end) { @@ -425,9 +434,14 @@ int __init acpi_scan_nodes(unsigned long start, unsigned long end) nodes[i].end >> PAGE_SHIFT); /* for out of order entries in SRAT */ sort_node_map(); - if (!nodes_cover_memory(nodes)) { - bad_srat(); - return -1; + if (!srat_bypass_bios) { + if (!nodes_cover_memory(nodes)) { + bad_srat(); + return -1; + } + } else { + printk(KERN_INFO + "SRAT: Bypassing NUMA sanity check...bad BIOS...\n"); } /* Account for nodes with cpus and no memory */
On certain BIOSes, SRAT enumeration isn't exported correctly. This leads to NUMA node enumeration failure, and causes the kernel to fall back onto a single node treated as flat memory. This can happen on large, multi-socket systems (4 or more sockets), and becomes problematic for performance. This patch adds a boot parameter to allow a kernel to be booted with the option to skip the SRAT check. There are BIOSes in production that have these failures, so this will allow people in the field to work around these BIOS issues. Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com> --- Documentation/x86/x86_64/boot-options.txt | 4 ++++ arch/x86/mm/srat_64.c | 20 +++++++++++++++++--- 2 files changed, 21 insertions(+), 3 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html