Message ID | c284ee977a3d52ddd5c01638be391e24b7a59b3d.1465311052.git.ego@linux.vnet.ibm.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Hi Gautham, Thanks a lot for the fix. With your patches applied, 4.7.0-rc2 builds fine on ppc64le bare metal. Boot was successful with No call traces. Thanks for all your support ! Regard's Abdul On Tuesday 07 June 2016 08:44 PM, Gautham R. Shenoy wrote: > With commit e9d867a67fd03ccc ("sched: Allow per-cpu kernel threads to > run on online && !active"), __set_cpus_allowed_ptr() expects that only > strict per-cpu kernel threads can have affinity to an online CPU which > is not yet active. > > This assumption is currently broken in the CPU_ONLINE notification > handler for the workqueues where restore_unbound_workers_cpumask() > calls set_cpus_allowed_ptr() when the first cpu in the unbound > worker's pool->attr->cpumask comes online. Since > set_cpus_allowed_ptr() is called with pool->attr->cpumask in which > only one CPU is online which is not yet active, we get the following > WARN_ON during an CPU online operation. > > ------------[ cut here ]------------ > WARNING: CPU: 40 PID: 248 at kernel/sched/core.c:1166 > __set_cpus_allowed_ptr+0x228/0x2e0 > Modules linked in: > CPU: 40 PID: 248 Comm: cpuhp/40 Not tainted 4.6.0-autotest+ #4 > <..snip..> > Call Trace: > [c000000f273ff920] [c00000000010493c] __set_cpus_allowed_ptr+0x2cc/0x2e0 (unreliable) > [c000000f273ffac0] [c0000000000ed4b0] workqueue_cpu_up_callback+0x2c0/0x470 > [c000000f273ffb70] [c0000000000f5c58] notifier_call_chain+0x98/0x100 > [c000000f273ffbc0] [c0000000000c5ed0] __cpu_notify+0x70/0xe0 > [c000000f273ffc00] [c0000000000c6028] notify_online+0x38/0x50 > [c000000f273ffc30] [c0000000000c5214] cpuhp_invoke_callback+0x84/0x250 > [c000000f273ffc90] [c0000000000c562c] cpuhp_up_callbacks+0x5c/0x120 > [c000000f273ffce0] [c0000000000c64d4] cpuhp_thread_fun+0x184/0x1c0 > [c000000f273ffd20] [c0000000000fa050] smpboot_thread_fn+0x290/0x2a0 > [c000000f273ffd80] [c0000000000f45b0] kthread+0x110/0x130 > [c000000f273ffe30] [c000000000009570] ret_from_kernel_thread+0x5c/0x6c > ---[ end trace 00f1456578b2a3b2 ]--- > > This patch sets the affinity of the worker to > a) the only online CPU in the cpumask of the worker pool when it comes > online. > b) the cpumask of the worker pool when the second CPU in the pool's > cpumask comes online. > > Reported-by: Abdul Haleem <abdhalee@linux.vnet.ibm.com> > Cc: Peter Zijlstra <peterz@infradead.org> > Cc: Thomas Gleixner <tglx@linutronix.de> > Cc: Tejun Heo <htejun@gmail.com> > Cc: Michael Ellerman <mpe@ellerman.id.au> > Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> > ---
On Tue, Jun 07, 2016 at 08:44:03PM +0530, Gautham R. Shenoy wrote: I'm still puzzled why we don't see this on x86. Afaict there's nothing PPC specific about this. > This patch sets the affinity of the worker to > a) the only online CPU in the cpumask of the worker pool when it comes > online. > b) the cpumask of the worker pool when the second CPU in the pool's > cpumask comes online. This basically works around the WARN conditions, which I suppose is fair enough, but I would like a note here to revisit this once the whole cpu hotplug rework has settled. The real problem is that workqueues seem to want to create worker threads before there's anybody who would use them or something like that. Or is that what PPC does funny? Use an unbound workqueue this early in cpu bringup? > @@ -4600,15 +4600,26 @@ static void restore_unbound_workers_cpumask(struct worker_pool *pool, int cpu) > if (!cpumask_test_cpu(cpu, pool->attrs->cpumask)) > return; > > - /* is @cpu the only online CPU? */ > cpumask_and(&cpumask, pool->attrs->cpumask, cpu_online_mask); > - if (cpumask_weight(&cpumask) != 1) > + > + /* > + * The affinity needs to be set > + * a) to @cpu when that is the only online CPU in > + * pool->attrs->cpumask. > + * b) to pool->attrs->cpumask when exactly two CPUs in > + * pool->attrs->cpumask are online. This affinity will be > + * retained when subsequent CPUs come online. > + */ > + if (cpumask_weight(&cpumask) > 2) > return; > > + if (cpumask_weight(&cpumask) == 2) > + cpumask_copy(&cpumask, pool->attrs->cpumask); > + > /* as we're called from CPU_ONLINE, the following shouldn't fail */ > for_each_pool_worker(worker, pool) > WARN_ON_ONCE(set_cpus_allowed_ptr(worker->task, > - pool->attrs->cpumask) < 0); > + &cpumask) < 0); > }
Hi Peter, On Tue, Jun 14, 2016 at 01:22:34PM +0200, Peter Zijlstra wrote: > On Tue, Jun 07, 2016 at 08:44:03PM +0530, Gautham R. Shenoy wrote: > > I'm still puzzled why we don't see this on x86. Afaict there's nothing > PPC specific about this. You are right. On PPC, at boot time we hit the WARN_ON like once in 5 times. Using some debug prints, I have verified that these are instances when the workqueue subsystem gets initialized before all the CPUs come online. On x86, I have never been able to hit this since it appears that every time the workqueues get initialized only after all the CPUs have come online. PPC doesn't uses any specific unbound workqueue early in the boot. The unbound workqueues causing the WARN_ON() were the "events_unbound" workqueue which was created by workqueue_init(). ================================================================================= [WQ] Creating Unbound workers for WQ events_unbound,cpumask 0-127. online mask 0 [WQ] Creating Unbound workers for WQ events_unbound,cpumask 0-31. online mask 0 [WQ] Creating Unbound workers for WQ events_unbound,cpumask 32-63. online mask 0 [WQ] Creating Unbound workers for WQ events_unbound,cpumask 64-95. online mask 0 [WQ] Creating Unbound workers for WQ events_unbound,cpumask 96-127. online mask 0 ================================================================================= Also, with the first patch in the series (which ensures that restore_unbound_workers are called *after* the new workers for the newly onlined CPUs are created) and without this one, you can reproduce this WARN_ON on both x86 and PPC by offlining all the CPUs of a node and bringing just one of them online. So essentially the BUG fixed by the previous patch is currently hiding this BUG which is why we are not able to reproduce this WARN_ON() with CPU-hotplug once the system has booted. > > > This patch sets the affinity of the worker to > > a) the only online CPU in the cpumask of the worker pool when it comes > > online. > > b) the cpumask of the worker pool when the second CPU in the pool's > > cpumask comes online. > > This basically works around the WARN conditions, which I suppose is fair > enough, but I would like a note here to revisit this once the whole cpu > hotplug rework has settled. > Sure. > The real problem is that workqueues seem to want to create worker > threads before there's anybody who would use them or something like > that. I am not sure about that. The workqueue creates unbound workers for a node via wq_update_unbound_numa() whenever the first CPU of every node comes online. So that seems legitimate. It then tries to affine these workers to the cpumask of that node. Again this seems right. As an optimization, it does this only when the first CPU of the node comes online. Since this online CPU is not yet active, and since nr_cpus_allowed > 1, we will hit the WARN_ON(). However, I agree with you that during boot-up, the workqueue subsystem needs to create unbound worker threads for only the online CPUs (instead of all possible CPUs as it currently does!) and let the CPU_ONLINE notification take care of creating the remaining workers when they are really required. > > Or is that what PPC does funny? Use an unbound workqueue this early in > cpu bringup? Like I pointed out above, PPC doesn't use an unbound workqueue early in the CPU bring up. -- Thanks and Regards gautham.
diff --git a/kernel/workqueue.c b/kernel/workqueue.c index e412794..1199f73 100644 --- a/kernel/workqueue.c +++ b/kernel/workqueue.c @@ -4586,7 +4586,7 @@ static void rebind_workers(struct worker_pool *pool) * * An unbound pool may end up with a cpumask which doesn't have any online * CPUs. When a worker of such pool get scheduled, the scheduler resets - * its cpus_allowed. If @cpu is in @pool's cpumask which didn't have any + * its cpus_allowed. If @cpu is in @pool's cpumask which had at most one * online CPU before, cpus_allowed of all its workers should be restored. */ static void restore_unbound_workers_cpumask(struct worker_pool *pool, int cpu) @@ -4600,15 +4600,26 @@ static void restore_unbound_workers_cpumask(struct worker_pool *pool, int cpu) if (!cpumask_test_cpu(cpu, pool->attrs->cpumask)) return; - /* is @cpu the only online CPU? */ cpumask_and(&cpumask, pool->attrs->cpumask, cpu_online_mask); - if (cpumask_weight(&cpumask) != 1) + + /* + * The affinity needs to be set + * a) to @cpu when that is the only online CPU in + * pool->attrs->cpumask. + * b) to pool->attrs->cpumask when exactly two CPUs in + * pool->attrs->cpumask are online. This affinity will be + * retained when subsequent CPUs come online. + */ + if (cpumask_weight(&cpumask) > 2) return; + if (cpumask_weight(&cpumask) == 2) + cpumask_copy(&cpumask, pool->attrs->cpumask); + /* as we're called from CPU_ONLINE, the following shouldn't fail */ for_each_pool_worker(worker, pool) WARN_ON_ONCE(set_cpus_allowed_ptr(worker->task, - pool->attrs->cpumask) < 0); + &cpumask) < 0); } /*
With commit e9d867a67fd03ccc ("sched: Allow per-cpu kernel threads to run on online && !active"), __set_cpus_allowed_ptr() expects that only strict per-cpu kernel threads can have affinity to an online CPU which is not yet active. This assumption is currently broken in the CPU_ONLINE notification handler for the workqueues where restore_unbound_workers_cpumask() calls set_cpus_allowed_ptr() when the first cpu in the unbound worker's pool->attr->cpumask comes online. Since set_cpus_allowed_ptr() is called with pool->attr->cpumask in which only one CPU is online which is not yet active, we get the following WARN_ON during an CPU online operation. ------------[ cut here ]------------ WARNING: CPU: 40 PID: 248 at kernel/sched/core.c:1166 __set_cpus_allowed_ptr+0x228/0x2e0 Modules linked in: CPU: 40 PID: 248 Comm: cpuhp/40 Not tainted 4.6.0-autotest+ #4 <..snip..> Call Trace: [c000000f273ff920] [c00000000010493c] __set_cpus_allowed_ptr+0x2cc/0x2e0 (unreliable) [c000000f273ffac0] [c0000000000ed4b0] workqueue_cpu_up_callback+0x2c0/0x470 [c000000f273ffb70] [c0000000000f5c58] notifier_call_chain+0x98/0x100 [c000000f273ffbc0] [c0000000000c5ed0] __cpu_notify+0x70/0xe0 [c000000f273ffc00] [c0000000000c6028] notify_online+0x38/0x50 [c000000f273ffc30] [c0000000000c5214] cpuhp_invoke_callback+0x84/0x250 [c000000f273ffc90] [c0000000000c562c] cpuhp_up_callbacks+0x5c/0x120 [c000000f273ffce0] [c0000000000c64d4] cpuhp_thread_fun+0x184/0x1c0 [c000000f273ffd20] [c0000000000fa050] smpboot_thread_fn+0x290/0x2a0 [c000000f273ffd80] [c0000000000f45b0] kthread+0x110/0x130 [c000000f273ffe30] [c000000000009570] ret_from_kernel_thread+0x5c/0x6c ---[ end trace 00f1456578b2a3b2 ]--- This patch sets the affinity of the worker to a) the only online CPU in the cpumask of the worker pool when it comes online. b) the cpumask of the worker pool when the second CPU in the pool's cpumask comes online. Reported-by: Abdul Haleem <abdhalee@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tejun Heo <htejun@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> --- kernel/workqueue.c | 19 +++++++++++++++---- 1 file changed, 15 insertions(+), 4 deletions(-)