diff mbox series

[RFC,5/5] powerpc/smp: Remap boot CPU onto core 0 if >= nr_cpu_ids

Message ID 20231229120107.2281153-5-mpe@ellerman.id.au (mailing list archive)
State Accepted
Commit 0875f1ceba974042069f04946aa8f1d4d1e688da
Headers show
Series [RFC,1/5] powerpc/smp: Adjust nr_cpu_ids to cover all threads of a core | expand

Checks

Context Check Description
snowpatch_ozlabs/github-powerpc_ppctests success Successfully ran 8 jobs.
snowpatch_ozlabs/github-powerpc_selftests success Successfully ran 8 jobs.
snowpatch_ozlabs/github-powerpc_sparse success Successfully ran 4 jobs.
snowpatch_ozlabs/github-powerpc_clang success Successfully ran 6 jobs.
snowpatch_ozlabs/github-powerpc_kernel_qemu success Successfully ran 23 jobs.

Commit Message

Michael Ellerman Dec. 29, 2023, 12:01 p.m. UTC
If nr_cpu_ids is too low to include the boot CPU, remap the boot CPU
onto logical core 0.

This is achieved in two stages. In early_init_dt_scan_cpus() the boot
CPU is renumbered to be on logical core 0, and the original boot core's
hardware ID is recorded.

Later in smp_setup_cpu_maps(), if the original boot core ID is set, the
logical CPU numbers on the 0th core are skipped in the normal device
tree search over CPU device tree nodes. Then the search is continued
until the device tree node matching the boot core is found, and those
CPUs are assigned the CPU numbers starting at 0.

This allows kdump kernels to be booted with low values for nr_cpu_ids
to conserve memory, while also allowing the crashing/boot CPU to be
any CPU.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
---
 arch/powerpc/include/asm/smp.h     |  1 +
 arch/powerpc/kernel/prom.c         | 16 +++++++++++-----
 arch/powerpc/kernel/setup-common.c | 19 +++++++++++++++++--
 3 files changed, 29 insertions(+), 7 deletions(-)

Comments

Michael Ellerman Dec. 29, 2023, 12:07 p.m. UTC | #1
Michael Ellerman <mpe@ellerman.id.au> writes:
> If nr_cpu_ids is too low to include the boot CPU, remap the boot CPU
> onto logical core 0.

Hi guys,

I finally got time to look at this issue. I think this series should fix
the problems that have been seen. I've tested this fairly thoroughly
with a qemu script, and also a few boots on a real machine.

If you can test it with your setups that would be great. Hopefully there
isn't some obscure case I've missed.

cheers
Pingfan Liu Jan. 2, 2024, 12:51 a.m. UTC | #2
On Fri, Dec 29, 2023 at 8:07 PM Michael Ellerman <mpe@ellerman.id.au> wrote:
>
> Michael Ellerman <mpe@ellerman.id.au> writes:
> > If nr_cpu_ids is too low to include the boot CPU, remap the boot CPU
> > onto logical core 0.
>
> Hi guys,
>
> I finally got time to look at this issue. I think this series should fix

Thanks a lot for sparing time on it and hope we can close this
prolonged issue soon.

And loop in Wen Xiong and Ming Lei, who care for this issue too.

Best Regards,

Pingfan

> the problems that have been seen. I've tested this fairly thoroughly
> with a qemu script, and also a few boots on a real machine.
>
> If you can test it with your setups that would be great. Hopefully there
> isn't some obscure case I've missed.
>
> cheers
>
Aneesh Kumar K.V (IBM) Jan. 2, 2024, 4:46 a.m. UTC | #3
Michael Ellerman <mpe@ellerman.id.au> writes:

....

>  #ifdef CONFIG_PPC64
>  int boot_cpu_hwid = -1;
> @@ -492,12 +493,26 @@ void __init smp_setup_cpu_maps(void)
>  			avail = !of_property_match_string(dn,
>  					"enable-method", "spin-table");
>  
> -		cpu = assign_threads(cpu, nthreads, avail, intserv);
> +		if (boot_core_hwid >= 0) {
> +			if (cpu == 0) {
> +				pr_info("Skipping CPU node %pOF to allow for boot core.\n", dn);
> +				cpu = nthreads;
> +				continue;
> +			}
>  
> -		if (cpu >= nr_cpu_ids) {
> +			if (be32_to_cpu(intserv[0]) == boot_core_hwid) {
> +				pr_info("Renumbered boot core %pOF to logical 0\n", dn);
> +				assign_threads(0, nthreads, avail, intserv);
> +				of_node_put(dn);
> +				break;
>

I was expecting a 'continue' here. Why 'break' the loop? The condition that
should break the loop should be cpu >= nr_cpu_ids 


> +			}
> +		} else if (cpu >= nr_cpu_ids) {
>  			of_node_put(dn);
>  			break;
>  		}
> +
> +		if (cpu < nr_cpu_ids)
> +			cpu = assign_threads(cpu, nthreads, avail, intserv);
>  	}
>  
>  	/* If no SMT supported, nthreads is forced to 1 */
> -- 
> 2.43.0

-aneesh
Jiri Bohac Feb. 9, 2024, 3:56 p.m. UTC | #4
On Tue, Jan 02, 2024 at 10:16:04AM +0530, Aneesh Kumar K.V wrote:
> Michael Ellerman <mpe@ellerman.id.au> writes:
> 
> ....
> 
> >  #ifdef CONFIG_PPC64
> >  int boot_cpu_hwid = -1;
> > @@ -492,12 +493,26 @@ void __init smp_setup_cpu_maps(void)
> >  			avail = !of_property_match_string(dn,
> >  					"enable-method", "spin-table");
> >  
> > -		cpu = assign_threads(cpu, nthreads, avail, intserv);
> > +		if (boot_core_hwid >= 0) {
> > +			if (cpu == 0) {
> > +				pr_info("Skipping CPU node %pOF to allow for boot core.\n", dn);
> > +				cpu = nthreads;
> > +				continue;
> > +			}
> >  
> > -		if (cpu >= nr_cpu_ids) {
> > +			if (be32_to_cpu(intserv[0]) == boot_core_hwid) {
> > +				pr_info("Renumbered boot core %pOF to logical 0\n", dn);
> > +				assign_threads(0, nthreads, avail, intserv);
> > +				of_node_put(dn);
> > +				break;
> >
> 
> I was expecting a 'continue' here. Why 'break' the loop? The condition that
> should break the loop should be cpu >= nr_cpu_ids 

No, the patch seems correct:

We're in the "if (boot_core_hwid >= 0)" branch, meaning that it
was determined by early_init_dt_scan_cpus() that boot_cpuid >=
nr_cpu_ids. So we loop until we get to the boot CPU, so it can be
renumbered to 0. Once we do that we break, because we
know we are already past nr_cpu_ids - otherwise boot_core_hwid
would not be >= 0. 


> > +			}
> > +		} else if (cpu >= nr_cpu_ids) {
> >  			of_node_put(dn);
> >  			break;
> >  		}

Here is what you expected - in case the boot CPU was < nr_cpu_ids
we break as soon as nr_cpu_ids is reached.

> > +
> > +		if (cpu < nr_cpu_ids)

this ensures that CPUs between nr_cpu_ids and the boot CPU are
correctly ignored in case we're already past nr_cpu_ids and only
scanning further to find the boot CPU to be renumbered to 0
Wen Xiong Feb. 13, 2024, 8:16 p.m. UTC | #5
>>And loop in Wen Xiong and Ming Lei, who care for this issue too.


Hi Pingfan, Thanks for your email!

Hi Michael,

Thanks for your new patchset!

In past month, Our several test team have modified /etc/sysconfig/kdump file with nr_cpus=1, triggered kdump over the following IO devices and  kdump works fine with generating vmcore file.

Test kernel:  the latest upstream kernel + your patchset
                      the latest rhel94 kernel + backport of your patchset

Test hardware: Nvme drives, FC adapter, NVmf-over-FC

Test device drivers: nvme, lpfc, nvme-fc

Thanks for your work! Please consideration for inclusion in upstream kernel.

Thanks for your help!
Wendy

> the problems that have been seen. I've tested this fairly thoroughly 
> with a qemu script, and also a few boots on a real machine.
>
> If you can test it with your setups that would be great. Hopefully 
> there isn't some obscure case I've missed.
>
> cheers
>
Michael Ellerman Feb. 14, 2024, 1:12 p.m. UTC | #6
Jiri Bohac <jbohac@suse.cz> writes:
> On Tue, Jan 02, 2024 at 10:16:04AM +0530, Aneesh Kumar K.V wrote:
>> Michael Ellerman <mpe@ellerman.id.au> writes:
>> 
>> ....
>> 
>> >  #ifdef CONFIG_PPC64
>> >  int boot_cpu_hwid = -1;
>> > @@ -492,12 +493,26 @@ void __init smp_setup_cpu_maps(void)
>> >  			avail = !of_property_match_string(dn,
>> >  					"enable-method", "spin-table");
>> >  
>> > -		cpu = assign_threads(cpu, nthreads, avail, intserv);
>> > +		if (boot_core_hwid >= 0) {
>> > +			if (cpu == 0) {
>> > +				pr_info("Skipping CPU node %pOF to allow for boot core.\n", dn);
>> > +				cpu = nthreads;
>> > +				continue;
>> > +			}
>> >  
>> > -		if (cpu >= nr_cpu_ids) {
>> > +			if (be32_to_cpu(intserv[0]) == boot_core_hwid) {
>> > +				pr_info("Renumbered boot core %pOF to logical 0\n", dn);
>> > +				assign_threads(0, nthreads, avail, intserv);
>> > +				of_node_put(dn);
>> > +				break;
>> >
>> 
>> I was expecting a 'continue' here. Why 'break' the loop? The condition that
>> should break the loop should be cpu >= nr_cpu_ids 
>
> No, the patch seems correct:
>
> We're in the "if (boot_core_hwid >= 0)" branch, meaning that it
> was determined by early_init_dt_scan_cpus() that boot_cpuid >=
> nr_cpu_ids. So we loop until we get to the boot CPU, so it can be
> renumbered to 0. Once we do that we break, because we
> know we are already past nr_cpu_ids - otherwise boot_core_hwid
> would not be >= 0. 

Yes that's exactly right.

Thanks for answering for me (was on leave and still catching up).

cheers
diff mbox series

Patch

diff --git a/arch/powerpc/include/asm/smp.h b/arch/powerpc/include/asm/smp.h
index aaaa576d0e15..b77927ccb0ab 100644
--- a/arch/powerpc/include/asm/smp.h
+++ b/arch/powerpc/include/asm/smp.h
@@ -27,6 +27,7 @@ 
 
 extern int boot_cpuid;
 extern int boot_cpu_hwid; /* PPC64 only */
+extern int boot_core_hwid;
 extern int spinning_secondaries;
 extern u32 *cpu_to_phys_id;
 extern bool coregroup_enabled;
diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c
index 77364729a1b6..af2b70585b2e 100644
--- a/arch/powerpc/kernel/prom.c
+++ b/arch/powerpc/kernel/prom.c
@@ -368,8 +368,6 @@  static int __init early_init_dt_scan_cpus(unsigned long node,
 	if (found < 0)
 		return 0;
 
-	DBG("boot cpu: logical %d physical %d\n", found,
-	    be32_to_cpu(intserv[found_thread]));
 	boot_cpuid = found;
 
 	if (IS_ENABLED(CONFIG_PPC64))
@@ -382,11 +380,19 @@  static int __init early_init_dt_scan_cpus(unsigned long node,
 	}
 
 	if (boot_cpuid >= nr_cpu_ids) {
-		set_nr_cpu_ids(min(CONFIG_NR_CPUS, ALIGN(boot_cpuid + 1, nthreads)));
-		pr_warn("Boot CPU %d >= nr_cpu_ids, adjusted nr_cpu_ids to %d\n",
-			boot_cpuid, nr_cpu_ids);
+		// Remember boot core for smp_setup_cpu_maps()
+		boot_core_hwid = be32_to_cpu(intserv[0]);
+		
+		pr_warn("Boot CPU %d (core hwid %d) >= nr_cpu_ids, adjusted boot CPU to %d\n",
+			boot_cpuid, boot_core_hwid, found_thread);
+
+		// Adjust boot CPU to appear on logical core 0
+		boot_cpuid = found_thread;
 	}
 
+	DBG("boot cpu: logical %d physical %d\n", boot_cpuid,
+	    be32_to_cpu(intserv[found_thread]));
+
 	/*
 	 * PAPR defines "logical" PVR values for cpus that
 	 * meet various levels of the architecture:
diff --git a/arch/powerpc/kernel/setup-common.c b/arch/powerpc/kernel/setup-common.c
index d9f8ed9bd2fc..5844f3d113a5 100644
--- a/arch/powerpc/kernel/setup-common.c
+++ b/arch/powerpc/kernel/setup-common.c
@@ -85,6 +85,7 @@  EXPORT_SYMBOL(machine_id);
 
 int boot_cpuid = -1;
 EXPORT_SYMBOL_GPL(boot_cpuid);
+int __initdata boot_core_hwid = -1;
 
 #ifdef CONFIG_PPC64
 int boot_cpu_hwid = -1;
@@ -492,12 +493,26 @@  void __init smp_setup_cpu_maps(void)
 			avail = !of_property_match_string(dn,
 					"enable-method", "spin-table");
 
-		cpu = assign_threads(cpu, nthreads, avail, intserv);
+		if (boot_core_hwid >= 0) {
+			if (cpu == 0) {
+				pr_info("Skipping CPU node %pOF to allow for boot core.\n", dn);
+				cpu = nthreads;
+				continue;
+			}
 
-		if (cpu >= nr_cpu_ids) {
+			if (be32_to_cpu(intserv[0]) == boot_core_hwid) {
+				pr_info("Renumbered boot core %pOF to logical 0\n", dn);
+				assign_threads(0, nthreads, avail, intserv);
+				of_node_put(dn);
+				break;
+			}
+		} else if (cpu >= nr_cpu_ids) {
 			of_node_put(dn);
 			break;
 		}
+
+		if (cpu < nr_cpu_ids)
+			cpu = assign_threads(cpu, nthreads, avail, intserv);
 	}
 
 	/* If no SMT supported, nthreads is forced to 1 */