diff mbox

[2/2] workqueue:Fix affinity of an unbound worker of a node with 1 online CPU

Message ID 20160615113249.GH30909@twins.programming.kicks-ass.net (mailing list archive)
State Not Applicable
Headers show

Commit Message

Peter Zijlstra June 15, 2016, 11:32 a.m. UTC
On Wed, Jun 15, 2016 at 03:49:36PM +0530, Gautham R Shenoy wrote:

> Also, with the first patch in the series (which ensures that
> restore_unbound_workers are called *after* the new workers for the
> newly onlined CPUs are created) and without this one, you can
> reproduce this WARN_ON on both x86 and PPC by offlining all the CPUs
> of a node and bringing just one of them online.

Ah good.

> I am not sure about that. The workqueue creates unbound workers for a
> node via wq_update_unbound_numa() whenever the first CPU of every node
> comes online. So that seems legitimate. It then tries to affine these
> workers to the cpumask of that node. Again this seems right. As an
> optimization, it does this only when the first CPU of the node comes
> online. Since this online CPU is not yet active, and since
> nr_cpus_allowed > 1, we will hit the WARN_ON().

So I had another look and isn't the below a much simpler solution?

It seems to work on my x86 with:

  for i in /sys/devices/system/cpu/cpu*/online ; do echo 0 > $i ; done
  for i in /sys/devices/system/cpu/cpu*/online ; do echo 1 > $i ; done

without complaint.

---
 kernel/workqueue.c | 14 +++++---------
 1 file changed, 5 insertions(+), 9 deletions(-)

Comments

Gautham R Shenoy June 15, 2016, 12:50 p.m. UTC | #1
On Wed, Jun 15, 2016 at 01:32:49PM +0200, Peter Zijlstra wrote:
> On Wed, Jun 15, 2016 at 03:49:36PM +0530, Gautham R Shenoy wrote:
> 
> > Also, with the first patch in the series (which ensures that
> > restore_unbound_workers are called *after* the new workers for the
> > newly onlined CPUs are created) and without this one, you can
> > reproduce this WARN_ON on both x86 and PPC by offlining all the CPUs
> > of a node and bringing just one of them online.
> 
> Ah good.
> 
> > I am not sure about that. The workqueue creates unbound workers for a
> > node via wq_update_unbound_numa() whenever the first CPU of every node
> > comes online. So that seems legitimate. It then tries to affine these
> > workers to the cpumask of that node. Again this seems right. As an
> > optimization, it does this only when the first CPU of the node comes
> > online. Since this online CPU is not yet active, and since
> > nr_cpus_allowed > 1, we will hit the WARN_ON().
> 
> So I had another look and isn't the below a much simpler solution?
> 
> It seems to work on my x86 with:
> 
>   for i in /sys/devices/system/cpu/cpu*/online ; do echo 0 > $i ; done
>   for i in /sys/devices/system/cpu/cpu*/online ; do echo 1 > $i ; done
> 
> without complaint.

Yup. This will work on PPC as well. 

We will no longer have the optimization in
restore_unbound_workers_cpumask() but I suppose we don't lose much by
resetting the affinity every time a CPU in the pool->attr->cpumask
comes online.

--
Thanks and Regards
gautham.
Peter Zijlstra June 15, 2016, 1:14 p.m. UTC | #2
On Wed, Jun 15, 2016 at 06:20:33PM +0530, Gautham R Shenoy wrote:
> We will no longer have the optimization in
> restore_unbound_workers_cpumask() but I suppose we don't lose much by
> resetting the affinity every time a CPU in the pool->attr->cpumask
> comes online.

Right; optimizing hotplug really isn't worth it. The code needs to be
simple and robust (ha! funny).

In any case, Tejun, does this work for you?
Tejun Heo June 15, 2016, 4:01 p.m. UTC | #3
Hello,

On Wed, Jun 15, 2016 at 03:14:15PM +0200, Peter Zijlstra wrote:
> On Wed, Jun 15, 2016 at 06:20:33PM +0530, Gautham R Shenoy wrote:
> > We will no longer have the optimization in
> > restore_unbound_workers_cpumask() but I suppose we don't lose much by
> > resetting the affinity every time a CPU in the pool->attr->cpumask
> > comes online.
> 
> Right; optimizing hotplug really isn't worth it. The code needs to be
> simple and robust (ha! funny).

The only case it might matter is CPU hotplug being used aggressively
for power saving.  No idea how popular that is now tho.
set_cpus_allowed isn't that expensive and phones don't tend to have
massive number of kworkers, so hopefully it won't show up.

> In any case, Tejun, does this work for you?

I'm not sure about the reordering part but for setting affinity on
each onlining, no objection.  If it ever shows up as performance /
power regression, we can revisit it later.

Thanks.
Michael Ellerman June 16, 2016, 12:11 p.m. UTC | #4
On Wed, 2016-06-15 at 12:01 -0400, Tejun Heo wrote:
> On Wed, Jun 15, 2016 at 03:14:15PM +0200, Peter Zijlstra wrote:
> > On Wed, Jun 15, 2016 at 06:20:33PM +0530, Gautham R Shenoy wrote:
> > > We will no longer have the optimization in
> > > restore_unbound_workers_cpumask() but I suppose we don't lose much by
> > > resetting the affinity every time a CPU in the pool->attr->cpumask
> > > comes online.
> > 
> > Right; optimizing hotplug really isn't worth it. The code needs to be
> > simple and robust (ha! funny).
> 
> The only case it might matter is CPU hotplug being used aggressively
> for power saving.  No idea how popular that is now tho.
> set_cpus_allowed isn't that expensive and phones don't tend to have
> massive number of kworkers, so hopefully it won't show up.
> 
> > In any case, Tejun, does this work for you?
> 
> I'm not sure about the reordering part but for setting affinity on
> each onlining, no objection.  If it ever shows up as performance /
> power regression, we can revisit it later.

Peterz do you want to send a SOB'ed patch, or can we take what you posted and
add your SOB?

And Tejun are you happy to merge this for rc4?

cheers
diff mbox

Patch

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index e1c0e99..09c9160 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -4600,15 +4600,11 @@  static void restore_unbound_workers_cpumask(struct worker_pool *pool, int cpu)
 	if (!cpumask_test_cpu(cpu, pool->attrs->cpumask))
 		return;
 
-	/* is @cpu the only online CPU? */
 	cpumask_and(&cpumask, pool->attrs->cpumask, cpu_online_mask);
-	if (cpumask_weight(&cpumask) != 1)
-		return;
 
 	/* as we're called from CPU_ONLINE, the following shouldn't fail */
 	for_each_pool_worker(worker, pool)
-		WARN_ON_ONCE(set_cpus_allowed_ptr(worker->task,
-						  pool->attrs->cpumask) < 0);
+		WARN_ON_ONCE(set_cpus_allowed_ptr(worker->task, &cpumask) < 0);
 }
 
 /*
@@ -4638,6 +4634,10 @@  static int workqueue_cpu_up_callback(struct notifier_block *nfb,
 	case CPU_ONLINE:
 		mutex_lock(&wq_pool_mutex);
 
+		/* update NUMA affinity of unbound workqueues */
+		list_for_each_entry(wq, &workqueues, list)
+			wq_update_unbound_numa(wq, cpu, true);
+
 		for_each_pool(pool, pi) {
 			mutex_lock(&pool->attach_mutex);
 
@@ -4649,10 +4649,6 @@  static int workqueue_cpu_up_callback(struct notifier_block *nfb,
 			mutex_unlock(&pool->attach_mutex);
 		}
 
-		/* update NUMA affinity of unbound workqueues */
-		list_for_each_entry(wq, &workqueues, list)
-			wq_update_unbound_numa(wq, cpu, true);
-
 		mutex_unlock(&wq_pool_mutex);
 		break;
 	}