powerpc/powernv/cpuidle: Pass correct drv->cpumask for registration

Submitted by Vaidyanathan Srinivasan on March 17, 2017, 6:05 p.m.

Details

Message ID 20170317180550.9931-1-svaidy@linux.vnet.ibm.com
State Changes Requested
Headers show

Commit Message

Vaidyanathan Srinivasan March 17, 2017, 6:05 p.m.
drv->cpumask defaults to cpu_possible_mask in __cpuidle_driver_init().
This breaks cpuidle on powernv where sysfs files are not created for
cpus in cpu_possible_mask that cannot be hot-added.

Trying cpuidle_register_device() on cpu without sysfs node will
cause crash like:

cpu 0xf: Vector: 380 (Data SLB Access) at [c000000ff1503490]
    pc: c00000000022c8bc: string+0x34/0x60
    lr: c00000000022ed78: vsnprintf+0x284/0x42c
    sp: c000000ff1503710
   msr: 9000000000009033
   dar: 6000000060000000
  current = 0xc000000ff1480000
  paca    = 0xc00000000fe82d00   softe: 0        irq_happened: 0x01
    pid   = 1, comm = swapper/8
Linux version 4.11.0-rc2 (sv@sagarika) (gcc version 4.9.4 (Buildroot 2017.02-00004-gc28573e) ) #15 SMP Fri Mar 17 19:32:02 IST 2017
enter ? for help
[link register   ] c00000000022ed78 vsnprintf+0x284/0x42c
[c000000ff1503710] c00000000022ebb8 vsnprintf+0xc4/0x42c (unreliable)
[c000000ff1503800] c00000000022ef40 vscnprintf+0x20/0x44
[c000000ff1503830] c0000000000ab61c vprintk_emit+0x94/0x2cc
[c000000ff15038a0] c0000000000acc9c vprintk_func+0x60/0x74
[c000000ff15038c0] c000000000619694 printk+0x38/0x4c
[c000000ff15038e0] c000000000224950 kobject_get+0x40/0x60
[c000000ff1503950] c00000000022507c kobject_add_internal+0x60/0x2c4
[c000000ff15039e0] c000000000225350 kobject_init_and_add+0x70/0x78
[c000000ff1503a60] c00000000053c288 cpuidle_add_sysfs+0x9c/0xe0
[c000000ff1503ae0] c00000000053aeac cpuidle_register_device+0xd4/0x12c
[c000000ff1503b30] c00000000053b108 cpuidle_register+0x98/0xcc
[c000000ff1503bc0] c00000000085eaf0 powernv_processor_idle_init+0x140/0x1e0
[c000000ff1503c60] c00000000000cd60 do_one_initcall+0xc0/0x15c
[c000000ff1503d20] c000000000833e84 kernel_init_freeable+0x1a0/0x25c
[c000000ff1503dc0] c00000000000d478 kernel_init+0x24/0x12c
[c000000ff1503e30] c00000000000b564 ret_from_kernel_thread+0x5c/0x78

This patch fixes the issue by passing correct cpumask from
powernv-cpuidle driver.

Signed-off-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
---
 drivers/cpuidle/cpuidle-powernv.c | 22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)

Comments

Michael Neuling March 18, 2017, 5:28 a.m.
Vaidy,

Thanks for fixing this.

> drv->cpumask defaults to cpu_possible_mask in __cpuidle_driver_init().
> This breaks cpuidle on powernv where sysfs files are not created for
> cpus in cpu_possible_mask that cannot be hot-added.

I think I prefer the longer description below than this.

> This patch fixes the issue by passing correct cpumask from
> powernv-cpuidle driver.

Any reason the correct fix isn't to change __cpuidle_driver_init() to use
present mask?  Seems like any arch where present < possible is going to hit
this.

> Signed-off-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>

Can we CC stable too.  This breaks at least v4.10.

> ---
>  drivers/cpuidle/cpuidle-powernv.c | 22 ++++++++++++++++++++++
>  1 file changed, 22 insertions(+)
>
Vaidyanathan Srinivasan March 18, 2017, 6:53 a.m.
* Michael Neuling <mikey@neuling.org> [2017-03-18 16:28:02]:

> Vaidy,
> 
> Thanks for fixing this.
> 
> > drv->cpumask defaults to cpu_possible_mask in __cpuidle_driver_init().
> > This breaks cpuidle on powernv where sysfs files are not created for
> > cpus in cpu_possible_mask that cannot be hot-added.
> 
> I think I prefer the longer description below than this.

[PATCH] powerpc/powernv/cpuidle: Pass correct drv->cpumask for

drv->cpumask defaults to cpu_possible_mask in __cpuidle_driver_init().
This breaks cpuidle on powernv where sysfs files are not created for
cpus in cpu_possible_mask that cannot be hot-added.

On powernv platform cpu_present could be less than cpu_possible
in cases where firmware detects the cpu, but it is not available
for OS.  Such cpus are not hotplugable at runtime on powernv and
hence we skip creating sysfs files for these cpus.

Trying cpuidle_register_device() on cpu without sysfs node will
cause crash like:

cpu 0xf: Vector: 380 (Data SLB Access) at [c000000ff1503490]
    pc: c00000000022c8bc: string+0x34/0x60
    lr: c00000000022ed78: vsnprintf+0x284/0x42c
    sp: c000000ff1503710
   msr: 9000000000009033
   dar: 6000000060000000
  current = 0xc000000ff1480000
  paca    = 0xc00000000fe82d00   softe: 0        irq_happened: 0x01
    pid   = 1, comm = swapper/8
Linux version 4.11.0-rc2 (sv@sagarika) (gcc version 4.9.4 (Buildroot 2017.02-00004-gc28573e) ) #15 SMP Fri Mar 17 19:32:02 IST 2017
enter ? for help
[link register   ] c00000000022ed78 vsnprintf+0x284/0x42c
[c000000ff1503710] c00000000022ebb8 vsnprintf+0xc4/0x42c (unreliable)
[c000000ff1503800] c00000000022ef40 vscnprintf+0x20/0x44
[c000000ff1503830] c0000000000ab61c vprintk_emit+0x94/0x2cc
[c000000ff15038a0] c0000000000acc9c vprintk_func+0x60/0x74
[c000000ff15038c0] c000000000619694 printk+0x38/0x4c
[c000000ff15038e0] c000000000224950 kobject_get+0x40/0x60
[c000000ff1503950] c00000000022507c kobject_add_internal+0x60/0x2c4
[c000000ff15039e0] c000000000225350 kobject_init_and_add+0x70/0x78
[c000000ff1503a60] c00000000053c288 cpuidle_add_sysfs+0x9c/0xe0
[c000000ff1503ae0] c00000000053aeac cpuidle_register_device+0xd4/0x12c
[c000000ff1503b30] c00000000053b108 cpuidle_register+0x98/0xcc
[c000000ff1503bc0] c00000000085eaf0 powernv_processor_idle_init+0x140/0x1e0
[c000000ff1503c60] c00000000000cd60 do_one_initcall+0xc0/0x15c
[c000000ff1503d20] c000000000833e84 kernel_init_freeable+0x1a0/0x25c
[c000000ff1503dc0] c00000000000d478 kernel_init+0x24/0x12c
[c000000ff1503e30] c00000000000b564 ret_from_kernel_thread+0x5c/0x78

This patch fixes the issue by passing correct cpumask from
powernv-cpuidle driver.
 
> > This patch fixes the issue by passing correct cpumask from
> > powernv-cpuidle driver.
> 
> Any reason the correct fix isn't to change __cpuidle_driver_init() to use
> present mask?  Seems like any arch where present < possible is going to hit
> this.

How to handle the sysfs nodes for cpus that are not in cpu_present
mask is upto the arch/platform.  During init, arch can choose to
create sysfs nodes for cpu's that are not in present mask, which can
be later hot-added and then onlined.  In which case cpuidle driver
should register for such cpus as well.

The decision to allow a cpu to become available in cpu_present mask at
runtime and handle the sysfs nodes are upto the arch/platform and not
to be hard coded in generic driver.  The generic cpuidle would like to
register for all possible cpus if the arch driver does not override
it.

--Vaidy
Benjamin Herrenschmidt March 19, 2017, 5:55 a.m.
On Fri, 2017-03-17 at 23:35 +0530, Vaidyanathan Srinivasan wrote:
> +       /*
> +        * On PowerNV platform cpu_present may be less that cpu_possible
> +        * in cases where firmware detects the cpu, but it is not available
> +        * for OS.  Such CPUs are not hotplugable at runtime on PowerNV
> +        * platform and hence sysfs files are not created for those.
> +        * Generic topology_init() would skip creating sysfs directories
> +        * for cpus that are not present and not hotplugable later at
> +        * runtime.
> +        *
> + 

I wouldn't assume that they can't later be hotplugged. We need that fix
now as it's hitting machines today but we'll need to improve all that
code and the topology code to deal with hotplugging gracefully.

Cheers,
Ben.
Michael Ellerman March 20, 2017, 3:05 a.m.
Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> writes:

> * Michael Neuling <mikey@neuling.org> [2017-03-18 16:28:02]:
>
>> Vaidy,
>> 
>> Thanks for fixing this.
>> 
>> > drv->cpumask defaults to cpu_possible_mask in __cpuidle_driver_init().
>> > This breaks cpuidle on powernv where sysfs files are not created for
>> > cpus in cpu_possible_mask that cannot be hot-added.
>> 
>> I think I prefer the longer description below than this.
>
> [PATCH] powerpc/powernv/cpuidle: Pass correct drv->cpumask for
>
> drv->cpumask defaults to cpu_possible_mask in __cpuidle_driver_init().
> This breaks cpuidle on powernv where sysfs files are not created for
> cpus in cpu_possible_mask that cannot be hot-added.

I'm confused.

> On powernv platform cpu_present could be less than cpu_possible
> in cases where firmware detects the cpu, but it is not available
> for OS.

It's entirely normal for present < possible, on my laptop for example,
so I don't see how that causes the bug.

> Such cpus are not hotplugable at runtime on powernv and
> hence we skip creating sysfs files for these cpus.

Why are they not hotpluggable? Looking at topology_init() they should be
hotpluggable as long as ppc_md.cpu_die is populated, which it is on
PowerNV AFAICS.

Is it really "creating sysfs files" that's important, or is that we
don't call register_cpu() for those CPUs?

> Trying cpuidle_register_device() on cpu without sysfs node will
> cause crash like:
>
> cpu 0xf: Vector: 380 (Data SLB Access) at [c000000ff1503490]
>     pc: c00000000022c8bc: string+0x34/0x60
>     lr: c00000000022ed78: vsnprintf+0x284/0x42c
>     sp: c000000ff1503710
>    msr: 9000000000009033
>    dar: 6000000060000000
>   current = 0xc000000ff1480000
>   paca    = 0xc00000000fe82d00   softe: 0        irq_happened: 0x01
>     pid   = 1, comm = swapper/8
> Linux version 4.11.0-rc2 (sv@sagarika) (gcc version 4.9.4 (Buildroot 2017.02-00004-gc28573e) ) #15 SMP Fri Mar 17 19:32:02 IST 2017
> enter ? for help
> [link register   ] c00000000022ed78 vsnprintf+0x284/0x42c
> [c000000ff1503710] c00000000022ebb8 vsnprintf+0xc4/0x42c (unreliable)
> [c000000ff1503800] c00000000022ef40 vscnprintf+0x20/0x44
> [c000000ff1503830] c0000000000ab61c vprintk_emit+0x94/0x2cc
> [c000000ff15038a0] c0000000000acc9c vprintk_func+0x60/0x74
> [c000000ff15038c0] c000000000619694 printk+0x38/0x4c
> [c000000ff15038e0] c000000000224950 kobject_get+0x40/0x60
> [c000000ff1503950] c00000000022507c kobject_add_internal+0x60/0x2c4
> [c000000ff15039e0] c000000000225350 kobject_init_and_add+0x70/0x78
> [c000000ff1503a60] c00000000053c288 cpuidle_add_sysfs+0x9c/0xe0
> [c000000ff1503ae0] c00000000053aeac cpuidle_register_device+0xd4/0x12c
> [c000000ff1503b30] c00000000053b108 cpuidle_register+0x98/0xcc
> [c000000ff1503bc0] c00000000085eaf0 powernv_processor_idle_init+0x140/0x1e0
> [c000000ff1503c60] c00000000000cd60 do_one_initcall+0xc0/0x15c
> [c000000ff1503d20] c000000000833e84 kernel_init_freeable+0x1a0/0x25c
> [c000000ff1503dc0] c00000000000d478 kernel_init+0x24/0x12c
> [c000000ff1503e30] c00000000000b564 ret_from_kernel_thread+0x5c/0x78

I really don't understand how a CPU not being present leads to a crash
in printf()? Something in that call chain should have checked that the
CPU was registered before crashing in printf() - surely?

cheers
Vaidyanathan Srinivasan March 20, 2017, 4:31 a.m.
* Michael Ellerman <mpe@ellerman.id.au> [2017-03-20 14:05:39]:

> Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> writes:
> 
> > * Michael Neuling <mikey@neuling.org> [2017-03-18 16:28:02]:
> >
> >> Vaidy,
> >> 
> >> Thanks for fixing this.
> >> 
> >> > drv->cpumask defaults to cpu_possible_mask in __cpuidle_driver_init().
> >> > This breaks cpuidle on powernv where sysfs files are not created for
> >> > cpus in cpu_possible_mask that cannot be hot-added.
> >> 
> >> I think I prefer the longer description below than this.
> >
> > [PATCH] powerpc/powernv/cpuidle: Pass correct drv->cpumask for
> >
> > drv->cpumask defaults to cpu_possible_mask in __cpuidle_driver_init().
> > This breaks cpuidle on powernv where sysfs files are not created for
> > cpus in cpu_possible_mask that cannot be hot-added.
> 
> I'm confused.

It depends on CONFIG_HOTPLUG_CPU now, but we have the following
combinations to handle:
(a) CONFIG_HOTPLUG_CPU=y/n
(b) pseries vs powernv
 
> > On powernv platform cpu_present could be less than cpu_possible
> > in cases where firmware detects the cpu, but it is not available
> > for OS.
> 
> It's entirely normal for present < possible, on my laptop for example,
> so I don't see how that causes the bug.

Yes, present < possible in itself not a problem.  It is whether
cpu_device exist for that cpu or not.

> > Such cpus are not hotplugable at runtime on powernv and
> > hence we skip creating sysfs files for these cpus.
> 
> Why are they not hotpluggable? Looking at topology_init() they should be
> hotpluggable as long as ppc_md.cpu_die is populated, which it is on
> PowerNV AFAICS.

Currently it depends on CONFIG_HOTPLUG_CPU=y.
 
> Is it really "creating sysfs files" that's important, or is that we
> don't call register_cpu() for those CPUs?

Currently if CONFIG_HOTPLUG_CPU=n, then we skip calling register_cpu()
and that causes the problem.

The fix in cpuidle-powernv.c could be based on CONFIG_HOTPLUG_CPU, but
since we can choose to set cpu->hotpluggable based on other factors
even when CONFIG_HOTPLUG_CPU=y, I choose to use cpu_possible.  We will
need to change  cpuidle-powernv to register for additional cpus when
we really support hot-add of cpus that did not exist at boot.

> > Trying cpuidle_register_device() on cpu without sysfs node will
> > cause crash like:
> >
> > cpu 0xf: Vector: 380 (Data SLB Access) at [c000000ff1503490]
> >     pc: c00000000022c8bc: string+0x34/0x60
> >     lr: c00000000022ed78: vsnprintf+0x284/0x42c
> >     sp: c000000ff1503710
> >    msr: 9000000000009033
> >    dar: 6000000060000000
> >   current = 0xc000000ff1480000
> >   paca    = 0xc00000000fe82d00   softe: 0        irq_happened: 0x01
> >     pid   = 1, comm = swapper/8
> > Linux version 4.11.0-rc2 (sv@sagarika) (gcc version 4.9.4 (Buildroot 2017.02-00004-gc28573e) ) #15 SMP Fri Mar 17 19:32:02 IST 2017
> > enter ? for help
> > [link register   ] c00000000022ed78 vsnprintf+0x284/0x42c
> > [c000000ff1503710] c00000000022ebb8 vsnprintf+0xc4/0x42c (unreliable)
> > [c000000ff1503800] c00000000022ef40 vscnprintf+0x20/0x44
> > [c000000ff1503830] c0000000000ab61c vprintk_emit+0x94/0x2cc
> > [c000000ff15038a0] c0000000000acc9c vprintk_func+0x60/0x74
> > [c000000ff15038c0] c000000000619694 printk+0x38/0x4c
> > [c000000ff15038e0] c000000000224950 kobject_get+0x40/0x60
> > [c000000ff1503950] c00000000022507c kobject_add_internal+0x60/0x2c4
> > [c000000ff15039e0] c000000000225350 kobject_init_and_add+0x70/0x78
> > [c000000ff1503a60] c00000000053c288 cpuidle_add_sysfs+0x9c/0xe0
> > [c000000ff1503ae0] c00000000053aeac cpuidle_register_device+0xd4/0x12c
> > [c000000ff1503b30] c00000000053b108 cpuidle_register+0x98/0xcc
> > [c000000ff1503bc0] c00000000085eaf0 powernv_processor_idle_init+0x140/0x1e0
> > [c000000ff1503c60] c00000000000cd60 do_one_initcall+0xc0/0x15c
> > [c000000ff1503d20] c000000000833e84 kernel_init_freeable+0x1a0/0x25c
> > [c000000ff1503dc0] c00000000000d478 kernel_init+0x24/0x12c
> > [c000000ff1503e30] c00000000000b564 ret_from_kernel_thread+0x5c/0x78
> 
> I really don't understand how a CPU not being present leads to a crash
> in printf()? Something in that call chain should have checked that the
> CPU was registered before crashing in printf() - surely?

Yes, we should have just failed to register the cpuidle driver.  I have
the fix here:

[PATCH] cpuidle: Validate cpu_dev in cpuidle_add_sysfs
http://patchwork.ozlabs.org/patch/740634/

--Vaidy
Michael Ellerman March 22, 2017, 10:55 a.m.
Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> writes:
> * Michael Ellerman <mpe@ellerman.id.au> [2017-03-20 14:05:39]:
>> Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> writes:
>  
>> > On powernv platform cpu_present could be less than cpu_possible
>> > in cases where firmware detects the cpu, but it is not available
>> > for OS.
>> 
>> It's entirely normal for present < possible, on my laptop for example,
>> so I don't see how that causes the bug.
>
> Yes, present < possible in itself not a problem.  It is whether
> cpu_device exist for that cpu or not.
...
>
> Currently if CONFIG_HOTPLUG_CPU=n, then we skip calling register_cpu()
> and that causes the problem.
...
>> 
>> I really don't understand how a CPU not being present leads to a crash
>> in printf()? Something in that call chain should have checked that the
>> CPU was registered before crashing in printf() - surely?
>
> Yes, we should have just failed to register the cpuidle driver.  I have
> the fix here:
>
> [PATCH] cpuidle: Validate cpu_dev in cpuidle_add_sysfs
> http://patchwork.ozlabs.org/patch/740634/

OK. Can you send a v2 of this with a better change log that includes all
the clarifications above.

And despite your subject being powerpc/powernv/cpuidle, this is a
cpuidle patch. I can merge it, but I at least need you to Cc the cpuidle
maintainers so they have a chance to see it.

cheers
Vaidyanathan Srinivasan March 23, 2017, 3:59 a.m.
* Michael Ellerman <mpe@ellerman.id.au> [2017-03-22 21:55:50]:

> Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> writes:
> > * Michael Ellerman <mpe@ellerman.id.au> [2017-03-20 14:05:39]:
> >> Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> writes:
> >  
> >> > On powernv platform cpu_present could be less than cpu_possible
> >> > in cases where firmware detects the cpu, but it is not available
> >> > for OS.
> >> 
> >> It's entirely normal for present < possible, on my laptop for example,
> >> so I don't see how that causes the bug.
> >
> > Yes, present < possible in itself not a problem.  It is whether
> > cpu_device exist for that cpu or not.
> ...
> >
> > Currently if CONFIG_HOTPLUG_CPU=n, then we skip calling register_cpu()
> > and that causes the problem.
> ...
> >> 
> >> I really don't understand how a CPU not being present leads to a crash
> >> in printf()? Something in that call chain should have checked that the
> >> CPU was registered before crashing in printf() - surely?
> >
> > Yes, we should have just failed to register the cpuidle driver.  I have
> > the fix here:
> >
> > [PATCH] cpuidle: Validate cpu_dev in cpuidle_add_sysfs
> > http://patchwork.ozlabs.org/patch/740634/
> 
> OK. Can you send a v2 of this with a better change log that includes all
> the clarifications above.
> 
> And despite your subject being powerpc/powernv/cpuidle, this is a
> cpuidle patch. I can merge it, but I at least need you to Cc the cpuidle
> maintainers so they have a chance to see it.

Thanks for the review, I will post a v2 with more detailed commit log
and CC cpuidle maintainers and linux-pm.

--Vaidy

Patch hide | download patch | download mbox

diff --git a/drivers/cpuidle/cpuidle-powernv.c b/drivers/cpuidle/cpuidle-powernv.c
index 3705930..e7a8c2a 100644
--- a/drivers/cpuidle/cpuidle-powernv.c
+++ b/drivers/cpuidle/cpuidle-powernv.c
@@ -175,6 +175,28 @@  static int powernv_cpuidle_driver_init(void)
 		drv->state_count += 1;
 	}
 
+	/*
+	 * On PowerNV platform cpu_present may be less that cpu_possible
+	 * in cases where firmware detects the cpu, but it is not available
+	 * for OS.  Such CPUs are not hotplugable at runtime on PowerNV
+	 * platform and hence sysfs files are not created for those.
+	 * Generic topology_init() would skip creating sysfs directories
+	 * for cpus that are not present and not hotplugable later at
+	 * runtime.
+	 *
+	 * drv->cpumask defaults to cpu_possible_mask in __cpuidle_driver_init().
+	 * This breaks cpuidle on powernv where sysfs files are not created for
+	 * cpus in cpu_possible_mask that cannot be hot-added.
+	 *
+	 * Hence at runtime sysfs nodes are present for cpus only in
+	 * cpu_present_mask. Trying cpuidle_register_device() on cpu without
+	 * sysfs node is incorrect.
+	 *
+	 * Hence pass correct cpu mask to generic cpuidle driver.
+	 */
+
+	drv->cpumask = (struct cpumask *)cpu_present_mask;
+
 	return 0;
 }