mbox series

[SRU,Bionic,0/1] smpboot: don't call topology_sane() when Sub-NUMA-Clustering is enabled

Message ID 20200608041814.23501-1-matthew.ruffell@canonical.com
Headers show
Series smpboot: don't call topology_sane() when Sub-NUMA-Clustering is enabled | expand

Message

Matthew Ruffell June 8, 2020, 4:18 a.m. UTC
BugLink: https://bugs.launchpad.net/bugs/1882478

[Impact]

Intel Skylake server processors and onward have a different Last Level Cache
(LLC) topology than earlier processors, and such processors have a new feature
called Sub-NUMA-Clustering (SNC) which is similar to the existing 
Cluster-On-Die (CoD) feature earlier server processors has.

Sub-NUMA-Clustering divides the system into two "slices", each of which are
allocated half the CPU cores, half the Last Level Cache and one memory
controller. Each slice is enumerated as a NUMA node.

The difference between Sub-NUMA-Clustering and Cluster-On-Die is how the Last
Level Cache is exposed to each NUMA node. CoD had the same cache line present in
each half of the LLC. In SNC, each cache line is only present in its respective
slice. Because of this, the semantics around accessing LLC changes, with a
process accessing NUMA-local memory only seeing half the LLC capacity.

On systems with Sub-NUMA-Clustering enabled, on the Xenial 4.4 and Bionic 4.15
kernels we see the following oops during NUMA node enumeration:

.... node #0, CPUs: #1 #2 #3 #4 #5 #6
.... node #1, CPUs: #7
sched: CPU #7's llc-sibling CPU #0 is not on the same node! [node: 1 != 0]. Ignoring dependency.
WARNING: CPU: 7 PID: 0 at /build/linux-hwe-F5opqf/linux-hwe-4.15.0/arch/x86/kernel/smpboot.c:375 topology_sane.isra.4+0x6c/0x70
Modules linked in:
CPU: 7 PID: 0 Comm: swapper/7 Not tainted 4.15.0-47-generic #50~16.04.1-Ubuntu
Hardware name: HPE ProLiant DL360 Gen10/ProLiant DL360 Gen10, BIOS U32 10/02/2018
RIP: 0010:topology_sane.isra.4+0x6c/0x70
Call Trace:
set_cpu_sibling_map+0x153/0x540
start_secondary+0xb2/0x200
secondary_startup_64+0xa5/0xb0
#8 #9 #10 #11 #12 #13
.... node #0, CPUs: #14 #15 #16 #17 #18 #19 #20
.... node #1, CPUs: #21 #22 #23 #24 #25 #26 #27
smp: Brought up 2 nodes, 28 CPUs 

This was with a Intel Xeon Gold 5120 CPU on a HP DL360 Gen10.

The oops happens because topology_sane() checks to see if the Last Level Cache
line matches across different CPUs, which it no longer does.

[Fix]

The fix comes in the form of the following upstream commit, which landed in
Linux 4.17:

commit 1340ccfa9a9afefdbab90d7935d4ed19817e37c2
Author: Alison Schofield <alison.schofield@intel.com>
Date: Fri Apr 6 17:21:30 2018 -0700
Subject: x86,sched: Allow topologies where NUMA nodes share an LLC
Link: https://github.com/torvalds/linux/commit/1340ccfa9a9afefdbab90d7935d4ed19817e37c2 

The commit adds a check for this particular family of Intel processors, and if
the CPU family matches, it simply skips the check to topology_sane().

The commit needs minor backports to Xenial 4.4 and Bionic 4.15, with the only
remarks being re-arranging #includes and small context fixups.

[Testcase]

Unfortunately, this is hardware specific. To test this, you need a Intel Skylake
server processor which supports Sub-NUMA-Clustering.

We have a customer with a Intel Xeon Gold 5120 CPU on a HP DL360 Gen10 that has
successfully tested the below test kernels, with good results.

Xenial 4.4 ppa:
https://launchpad.net/~mruffell/+archive/ubuntu/sf280048-test-ga

Xenial 4.15 HWE ppa:
https://launchpad.net/~mruffell/+archive/ubuntu/sf280048-test-hwe

Running the test kernel, the oops does not reproduce:

smp: Bringing up secondary CPUs ...
x86: Booting SMP configuration:
.... node #0, CPUs: #1
NMI watchdog: Enabled. Permanently consumes one hw-PMU counter.
#2 #3 #4 #5 #6
.... node #1, CPUs: #7 #8 #9 #10 #11 #12 #13
.... node #0, CPUs: #14 #15 #16 #17 #18 #19 #20
.... node #1, CPUs: #21 #22 #23 #24 #25 #26 #27
smp: Brought up 2 nodes, 28 CPUs
smpboot: Max logical packages: 1
smpboot: Total of 28 processors activated

[Regression Potential]

The commit modifies a small section of smpboot code, which every machine will
execute on boot. The majority of the commit breaks up a large if statement into
smaller blocks than it was previously, and adds an extra if statement to check
for a specific processor family.

If a regression were to occur, some machines would or would not make their calls
to topology_sane(), which in the worst case, would result in a oops message and
slightly degraded performance. The system would still function normally.

The commit has been present since 4.17-rc2 and is present in Eoan and Focal.
There are no fixup commits, and no additional processor families have been
added since.

Because of the small re-arrangement in logic, and the addition of a processor
family check, these changes are fairly minor, and I don't think it will cause
any regressions.

Alison Schofield (1):
  x86,sched: Allow topologies where NUMA nodes share an LLC

 arch/x86/kernel/smpboot.c | 45 ++++++++++++++++++++++++++++++++++-----
 1 file changed, 40 insertions(+), 5 deletions(-)

Comments

Khalid Elmously June 30, 2020, 3:08 a.m. UTC | #1
On 2020-06-08 16:18:13 , Matthew Ruffell wrote:
> BugLink: https://bugs.launchpad.net/bugs/1882478
> 
> [Impact]
> 
> Intel Skylake server processors and onward have a different Last Level Cache
> (LLC) topology than earlier processors, and such processors have a new feature
> called Sub-NUMA-Clustering (SNC) which is similar to the existing 
> Cluster-On-Die (CoD) feature earlier server processors has.
> 
> Sub-NUMA-Clustering divides the system into two "slices", each of which are
> allocated half the CPU cores, half the Last Level Cache and one memory
> controller. Each slice is enumerated as a NUMA node.
> 
> The difference between Sub-NUMA-Clustering and Cluster-On-Die is how the Last
> Level Cache is exposed to each NUMA node. CoD had the same cache line present in
> each half of the LLC. In SNC, each cache line is only present in its respective
> slice. Because of this, the semantics around accessing LLC changes, with a
> process accessing NUMA-local memory only seeing half the LLC capacity.
> 
> On systems with Sub-NUMA-Clustering enabled, on the Xenial 4.4 and Bionic 4.15
> kernels we see the following oops during NUMA node enumeration:
> 
> .... node #0, CPUs: #1 #2 #3 #4 #5 #6
> .... node #1, CPUs: #7
> sched: CPU #7's llc-sibling CPU #0 is not on the same node! [node: 1 != 0]. Ignoring dependency.
> WARNING: CPU: 7 PID: 0 at /build/linux-hwe-F5opqf/linux-hwe-4.15.0/arch/x86/kernel/smpboot.c:375 topology_sane.isra.4+0x6c/0x70
> Modules linked in:
> CPU: 7 PID: 0 Comm: swapper/7 Not tainted 4.15.0-47-generic #50~16.04.1-Ubuntu
> Hardware name: HPE ProLiant DL360 Gen10/ProLiant DL360 Gen10, BIOS U32 10/02/2018
> RIP: 0010:topology_sane.isra.4+0x6c/0x70
> Call Trace:
> set_cpu_sibling_map+0x153/0x540
> start_secondary+0xb2/0x200
> secondary_startup_64+0xa5/0xb0
> #8 #9 #10 #11 #12 #13
> .... node #0, CPUs: #14 #15 #16 #17 #18 #19 #20
> .... node #1, CPUs: #21 #22 #23 #24 #25 #26 #27
> smp: Brought up 2 nodes, 28 CPUs 
> 
> This was with a Intel Xeon Gold 5120 CPU on a HP DL360 Gen10.
> 
> The oops happens because topology_sane() checks to see if the Last Level Cache
> line matches across different CPUs, which it no longer does.
> 
> [Fix]
> 
> The fix comes in the form of the following upstream commit, which landed in
> Linux 4.17:
> 
> commit 1340ccfa9a9afefdbab90d7935d4ed19817e37c2
> Author: Alison Schofield <alison.schofield@intel.com>
> Date: Fri Apr 6 17:21:30 2018 -0700
> Subject: x86,sched: Allow topologies where NUMA nodes share an LLC
> Link: https://github.com/torvalds/linux/commit/1340ccfa9a9afefdbab90d7935d4ed19817e37c2 
> 
> The commit adds a check for this particular family of Intel processors, and if
> the CPU family matches, it simply skips the check to topology_sane().
> 
> The commit needs minor backports to Xenial 4.4 and Bionic 4.15, with the only
> remarks being re-arranging #includes and small context fixups.
> 
> [Testcase]
> 
> Unfortunately, this is hardware specific. To test this, you need a Intel Skylake
> server processor which supports Sub-NUMA-Clustering.
> 
> We have a customer with a Intel Xeon Gold 5120 CPU on a HP DL360 Gen10 that has
> successfully tested the below test kernels, with good results.
> 
> Xenial 4.4 ppa:
> https://launchpad.net/~mruffell/+archive/ubuntu/sf280048-test-ga
> 
> Xenial 4.15 HWE ppa:
> https://launchpad.net/~mruffell/+archive/ubuntu/sf280048-test-hwe
> 
> Running the test kernel, the oops does not reproduce:
> 
> smp: Bringing up secondary CPUs ...
> x86: Booting SMP configuration:
> .... node #0, CPUs: #1
> NMI watchdog: Enabled. Permanently consumes one hw-PMU counter.
> #2 #3 #4 #5 #6
> .... node #1, CPUs: #7 #8 #9 #10 #11 #12 #13
> .... node #0, CPUs: #14 #15 #16 #17 #18 #19 #20
> .... node #1, CPUs: #21 #22 #23 #24 #25 #26 #27
> smp: Brought up 2 nodes, 28 CPUs
> smpboot: Max logical packages: 1
> smpboot: Total of 28 processors activated
> 
> [Regression Potential]
> 
> The commit modifies a small section of smpboot code, which every machine will
> execute on boot. The majority of the commit breaks up a large if statement into
> smaller blocks than it was previously, and adds an extra if statement to check
> for a specific processor family.
> 
> If a regression were to occur, some machines would or would not make their calls
> to topology_sane(), which in the worst case, would result in a oops message and
> slightly degraded performance. The system would still function normally.
> 
> The commit has been present since 4.17-rc2 and is present in Eoan and Focal.
> There are no fixup commits, and no additional processor families have been
> added since.
> 
> Because of the small re-arrangement in logic, and the addition of a processor
> family check, these changes are fairly minor, and I don't think it will cause
> any regressions.
> 
> Alison Schofield (1):
>   x86,sched: Allow topologies where NUMA nodes share an LLC
> 
>  arch/x86/kernel/smpboot.c | 45 ++++++++++++++++++++++++++++++++++-----
>  1 file changed, 40 insertions(+), 5 deletions(-)
> 
> -- 
> 2.25.1
> 
> 
> -- 
> kernel-team mailing list
> kernel-team@lists.ubuntu.com
> https://lists.ubuntu.com/mailman/listinfo/kernel-team