diff mbox

[2/2] workqueue:Fix affinity of an unbound worker of a node with 1 online CPU

Message ID c284ee977a3d52ddd5c01638be391e24b7a59b3d.1465311052.git.ego@linux.vnet.ibm.com (mailing list archive)
State Superseded
Headers show

Commit Message

Gautham R Shenoy June 7, 2016, 3:14 p.m. UTC
With commit e9d867a67fd03ccc ("sched: Allow per-cpu kernel threads to
run on online && !active"), __set_cpus_allowed_ptr() expects that only
strict per-cpu kernel threads can have affinity to an online CPU which
is not yet active.

This assumption is currently broken in the CPU_ONLINE notification
handler for the workqueues where restore_unbound_workers_cpumask()
calls set_cpus_allowed_ptr() when the first cpu in the unbound
worker's pool->attr->cpumask comes online. Since
set_cpus_allowed_ptr() is called with pool->attr->cpumask in which
only one CPU is online which is not yet active, we get the following
WARN_ON during an CPU online operation.

------------[ cut here ]------------
WARNING: CPU: 40 PID: 248 at kernel/sched/core.c:1166
__set_cpus_allowed_ptr+0x228/0x2e0
Modules linked in:
CPU: 40 PID: 248 Comm: cpuhp/40 Not tainted 4.6.0-autotest+ #4
<..snip..>
Call Trace:
[c000000f273ff920] [c00000000010493c] __set_cpus_allowed_ptr+0x2cc/0x2e0 (unreliable)
[c000000f273ffac0] [c0000000000ed4b0] workqueue_cpu_up_callback+0x2c0/0x470
[c000000f273ffb70] [c0000000000f5c58] notifier_call_chain+0x98/0x100
[c000000f273ffbc0] [c0000000000c5ed0] __cpu_notify+0x70/0xe0
[c000000f273ffc00] [c0000000000c6028] notify_online+0x38/0x50
[c000000f273ffc30] [c0000000000c5214] cpuhp_invoke_callback+0x84/0x250
[c000000f273ffc90] [c0000000000c562c] cpuhp_up_callbacks+0x5c/0x120
[c000000f273ffce0] [c0000000000c64d4] cpuhp_thread_fun+0x184/0x1c0
[c000000f273ffd20] [c0000000000fa050] smpboot_thread_fn+0x290/0x2a0
[c000000f273ffd80] [c0000000000f45b0] kthread+0x110/0x130
[c000000f273ffe30] [c000000000009570] ret_from_kernel_thread+0x5c/0x6c
---[ end trace 00f1456578b2a3b2 ]---

This patch sets the affinity of the worker to
a) the only online CPU in the cpumask of the worker pool when it comes
   online.
b) the cpumask of the worker pool when the second CPU in the pool's
   cpumask comes online.

Reported-by: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tejun Heo <htejun@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
---
 kernel/workqueue.c | 19 +++++++++++++++----
 1 file changed, 15 insertions(+), 4 deletions(-)

Comments

Abdul Haleem June 8, 2016, 6:03 a.m. UTC | #1
Hi Gautham,

Thanks a lot for the fix.

With your patches applied, 4.7.0-rc2 builds fine on ppc64le bare metal.
Boot was successful with No call traces.

Thanks for all your support !

Regard's
Abdul

On Tuesday 07 June 2016 08:44 PM, Gautham R. Shenoy wrote:

> With commit e9d867a67fd03ccc ("sched: Allow per-cpu kernel threads to
> run on online && !active"), __set_cpus_allowed_ptr() expects that only
> strict per-cpu kernel threads can have affinity to an online CPU which
> is not yet active.
>
> This assumption is currently broken in the CPU_ONLINE notification
> handler for the workqueues where restore_unbound_workers_cpumask()
> calls set_cpus_allowed_ptr() when the first cpu in the unbound
> worker's pool->attr->cpumask comes online. Since
> set_cpus_allowed_ptr() is called with pool->attr->cpumask in which
> only one CPU is online which is not yet active, we get the following
> WARN_ON during an CPU online operation.
>
> ------------[ cut here ]------------
> WARNING: CPU: 40 PID: 248 at kernel/sched/core.c:1166
> __set_cpus_allowed_ptr+0x228/0x2e0
> Modules linked in:
> CPU: 40 PID: 248 Comm: cpuhp/40 Not tainted 4.6.0-autotest+ #4
> <..snip..>
> Call Trace:
> [c000000f273ff920] [c00000000010493c] __set_cpus_allowed_ptr+0x2cc/0x2e0 (unreliable)
> [c000000f273ffac0] [c0000000000ed4b0] workqueue_cpu_up_callback+0x2c0/0x470
> [c000000f273ffb70] [c0000000000f5c58] notifier_call_chain+0x98/0x100
> [c000000f273ffbc0] [c0000000000c5ed0] __cpu_notify+0x70/0xe0
> [c000000f273ffc00] [c0000000000c6028] notify_online+0x38/0x50
> [c000000f273ffc30] [c0000000000c5214] cpuhp_invoke_callback+0x84/0x250
> [c000000f273ffc90] [c0000000000c562c] cpuhp_up_callbacks+0x5c/0x120
> [c000000f273ffce0] [c0000000000c64d4] cpuhp_thread_fun+0x184/0x1c0
> [c000000f273ffd20] [c0000000000fa050] smpboot_thread_fn+0x290/0x2a0
> [c000000f273ffd80] [c0000000000f45b0] kthread+0x110/0x130
> [c000000f273ffe30] [c000000000009570] ret_from_kernel_thread+0x5c/0x6c
> ---[ end trace 00f1456578b2a3b2 ]---
>
> This patch sets the affinity of the worker to
> a) the only online CPU in the cpumask of the worker pool when it comes
>     online.
> b) the cpumask of the worker pool when the second CPU in the pool's
>     cpumask comes online.
>
> Reported-by: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Tejun Heo <htejun@gmail.com>
> Cc: Michael Ellerman <mpe@ellerman.id.au>
> Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
> ---
Peter Zijlstra June 14, 2016, 11:22 a.m. UTC | #2
On Tue, Jun 07, 2016 at 08:44:03PM +0530, Gautham R. Shenoy wrote:

I'm still puzzled why we don't see this on x86. Afaict there's nothing
PPC specific about this.

> This patch sets the affinity of the worker to
> a) the only online CPU in the cpumask of the worker pool when it comes
>    online.
> b) the cpumask of the worker pool when the second CPU in the pool's
>    cpumask comes online.

This basically works around the WARN conditions, which I suppose is fair
enough, but I would like a note here to revisit this once the whole cpu
hotplug rework has settled.

The real problem is that workqueues seem to want to create worker
threads before there's anybody who would use them or something like
that.

Or is that what PPC does funny? Use an unbound workqueue this early in
cpu bringup?

> @@ -4600,15 +4600,26 @@ static void restore_unbound_workers_cpumask(struct worker_pool *pool, int cpu)
>  	if (!cpumask_test_cpu(cpu, pool->attrs->cpumask))
>  		return;
>  
> -	/* is @cpu the only online CPU? */
>  	cpumask_and(&cpumask, pool->attrs->cpumask, cpu_online_mask);
> -	if (cpumask_weight(&cpumask) != 1)
> +
> +	/*
> +	 * The affinity needs to be set
> +	 * a) to @cpu when that is the only online CPU in
> +	 *    pool->attrs->cpumask.
> +	 * b) to pool->attrs->cpumask when exactly two CPUs in
> +	 *    pool->attrs->cpumask are online. This affinity will be
> +	 *    retained when subsequent CPUs come online.
> +	 */
> +	if (cpumask_weight(&cpumask) > 2)
>  		return;
>  
> +	if (cpumask_weight(&cpumask) == 2)
> +		cpumask_copy(&cpumask, pool->attrs->cpumask);
> +
>  	/* as we're called from CPU_ONLINE, the following shouldn't fail */
>  	for_each_pool_worker(worker, pool)
>  		WARN_ON_ONCE(set_cpus_allowed_ptr(worker->task,
> -						  pool->attrs->cpumask) < 0);
> +						  &cpumask) < 0);
>  }
Gautham R Shenoy June 15, 2016, 10:19 a.m. UTC | #3
Hi Peter,

On Tue, Jun 14, 2016 at 01:22:34PM +0200, Peter Zijlstra wrote:
> On Tue, Jun 07, 2016 at 08:44:03PM +0530, Gautham R. Shenoy wrote:
> 
> I'm still puzzled why we don't see this on x86. Afaict there's nothing
> PPC specific about this.

You are right. On PPC, at boot time we hit the WARN_ON like once in 5
times. Using some debug prints, I have verified that these are
instances when the workqueue subsystem gets initialized before all the
CPUs come online. On x86, I have never been able to hit this since it
appears that every time the workqueues get initialized only after all
the CPUs have come online.

PPC doesn't uses any specific unbound workqueue early in the boot. The
unbound workqueues causing the WARN_ON() were the
"events_unbound" workqueue which was created by workqueue_init().

=================================================================================
[WQ] Creating Unbound workers for WQ events_unbound,cpumask 0-127.
     online mask 0
[WQ] Creating Unbound workers for WQ events_unbound,cpumask 0-31. 
     online mask 0
[WQ] Creating Unbound workers for WQ events_unbound,cpumask 32-63.
      online mask 0
[WQ] Creating Unbound workers for WQ events_unbound,cpumask 64-95.
      online mask 0
[WQ] Creating Unbound workers for WQ events_unbound,cpumask 96-127.
      online mask 0
=================================================================================

Also, with the first patch in the series (which ensures that
restore_unbound_workers are called *after* the new workers for the
newly onlined CPUs are created) and without this one, you can
reproduce this WARN_ON on both x86 and PPC by offlining all the CPUs
of a node and bringing just one of them online. So essentially the BUG
fixed by the previous patch is currently hiding this BUG which is why
we are not able to reproduce this WARN_ON() with CPU-hotplug once the
system has booted.

> 
> > This patch sets the affinity of the worker to
> > a) the only online CPU in the cpumask of the worker pool when it comes
> >    online.
> > b) the cpumask of the worker pool when the second CPU in the pool's
> >    cpumask comes online.
> 
> This basically works around the WARN conditions, which I suppose is fair
> enough, but I would like a note here to revisit this once the whole cpu
> hotplug rework has settled.
> 

Sure.

> The real problem is that workqueues seem to want to create worker
> threads before there's anybody who would use them or something like
> that.

I am not sure about that. The workqueue creates unbound workers for a
node via wq_update_unbound_numa() whenever the first CPU of every node
comes online. So that seems legitimate. It then tries to affine these
workers to the cpumask of that node. Again this seems right. As an
optimization, it does this only when the first CPU of the node comes
online. Since this online CPU is not yet active, and since
nr_cpus_allowed > 1, we will hit the WARN_ON().

However, I agree with you that during boot-up, the workqueue subsystem
needs to create unbound worker threads for only the online CPUs
(instead of all possible CPUs as it currently does!) and let the
CPU_ONLINE notification take care of creating the remaining workers
when they are really required.

> 
> Or is that what PPC does funny? Use an unbound workqueue this early in
> cpu bringup?

Like I pointed out above, PPC doesn't use an unbound workqueue early
in the CPU bring up.

--
Thanks and Regards
gautham.
diff mbox

Patch

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index e412794..1199f73 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -4586,7 +4586,7 @@  static void rebind_workers(struct worker_pool *pool)
  *
  * An unbound pool may end up with a cpumask which doesn't have any online
  * CPUs.  When a worker of such pool get scheduled, the scheduler resets
- * its cpus_allowed.  If @cpu is in @pool's cpumask which didn't have any
+ * its cpus_allowed.  If @cpu is in @pool's cpumask which had at most one
  * online CPU before, cpus_allowed of all its workers should be restored.
  */
 static void restore_unbound_workers_cpumask(struct worker_pool *pool, int cpu)
@@ -4600,15 +4600,26 @@  static void restore_unbound_workers_cpumask(struct worker_pool *pool, int cpu)
 	if (!cpumask_test_cpu(cpu, pool->attrs->cpumask))
 		return;
 
-	/* is @cpu the only online CPU? */
 	cpumask_and(&cpumask, pool->attrs->cpumask, cpu_online_mask);
-	if (cpumask_weight(&cpumask) != 1)
+
+	/*
+	 * The affinity needs to be set
+	 * a) to @cpu when that is the only online CPU in
+	 *    pool->attrs->cpumask.
+	 * b) to pool->attrs->cpumask when exactly two CPUs in
+	 *    pool->attrs->cpumask are online. This affinity will be
+	 *    retained when subsequent CPUs come online.
+	 */
+	if (cpumask_weight(&cpumask) > 2)
 		return;
 
+	if (cpumask_weight(&cpumask) == 2)
+		cpumask_copy(&cpumask, pool->attrs->cpumask);
+
 	/* as we're called from CPU_ONLINE, the following shouldn't fail */
 	for_each_pool_worker(worker, pool)
 		WARN_ON_ONCE(set_cpus_allowed_ptr(worker->task,
-						  pool->attrs->cpumask) < 0);
+						  &cpumask) < 0);
 }
 
 /*