From patchwork Thu Jun 16 19:35:04 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tejun Heo X-Patchwork-Id: 636672 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from lists.ozlabs.org (lists.ozlabs.org [103.22.144.68]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3rVttB1SQcz9sdn for ; Fri, 17 Jun 2016 05:36:34 +1000 (AEST) Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b=XY8rKm5W; dkim-atps=neutral Received: from ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3]) by lists.ozlabs.org (Postfix) with ESMTP id 3rVtt96wVzzDqfV for ; Fri, 17 Jun 2016 05:36:33 +1000 (AEST) Authentication-Results: lists.ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b=XY8rKm5W; dkim-atps=neutral X-Original-To: linuxppc-dev@lists.ozlabs.org Delivered-To: linuxppc-dev@lists.ozlabs.org Received: from mail-yw0-x241.google.com (mail-yw0-x241.google.com [IPv6:2607:f8b0:4002:c05::241]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 3rVtrY5sllzDqrX for ; Fri, 17 Jun 2016 05:35:09 +1000 (AEST) Authentication-Results: lists.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b=XY8rKm5W; dkim-atps=neutral Received: by mail-yw0-x241.google.com with SMTP id d137so621934ywe.0 for ; Thu, 16 Jun 2016 12:35:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=YxAqvoyEAyyucOXAR8qVAe1HUi9VKt5nE6BJJdngOas=; b=XY8rKm5WCgE06orZLgm+J2ul0aPkcFJmg6l4hGfeysVVuPn0YK/c6HYW6tlUg6oYvY US6s1VYlzrRdSCS1jLGp/TwLo3DhNUwLsYdOw+9n5IKk1fzDumDn62hWazO9oMfqztwr qXZRiGhzoGDEy/EA+gkq5ofv85mmM4ibrW7wHIFgPqKHWTxumw6PETLBwdh9BXTKQ1l2 d2kzNT/vyyYHQDjVwb9rTPWnkgdmUq0p+Vdu3H8DgUzXAS0UeRu7isUTLXtzTbUftRVt Lp2PJDFYpfzryG01CdIYHoitXHm/xKNoTGkWiW47Uk2cnlOs4icLIyahue0hwO5yebRc sNZw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=YxAqvoyEAyyucOXAR8qVAe1HUi9VKt5nE6BJJdngOas=; b=gk2+rsr4dkLKtQ6S/39kvEjC6t5uVholhEBWYbrdwClBjViCy5F/uKYSzVC/mzlzMA capG+ZZlnjS4ps+jL6qVwitbIbNK20Y9lRv1a7r+TCGg4PGNuEUtqvcem0mwhqOaiQDy SiFZF1iu1XO1BeftUt7Ujxc8XGFa+1k1ooOUUmlX/j3MQczZsUHIy/823o8my+GTn+rk YglO4ZaEoXZLn2PL8m5z27hB2J8Aw9HWx/BkZbQnaYBWn4A01A5vlVy2Uj7wyTBlw7Yy D9hUe6echIP8HR3ECli1AlysY3PdjEFZ/DBTw+EUYDhl5ikvlqL9NcdpyTf0u27v2XUe zbJg== X-Gm-Message-State: ALyK8tKbXnBjDo7njGBgdp/a/pXy9MWrdSKqkjJwDdyg0Jr9YJPbtNc+qcQIJVw9vwiGAQ== X-Received: by 10.37.73.130 with SMTP id w124mr3400452yba.169.1466105707419; Thu, 16 Jun 2016 12:35:07 -0700 (PDT) Received: from localhost ([2620:10d:c091:200::2:b0dc]) by smtp.gmail.com with ESMTPSA id n63sm2306605ywe.38.2016.06.16.12.35.05 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 16 Jun 2016 12:35:05 -0700 (PDT) Date: Thu, 16 Jun 2016 15:35:04 -0400 From: Tejun Heo To: Gautham R Shenoy Subject: Re: [PATCH 1/2] workqueue: Move wq_update_unbound_numa() to the beginning of CPU_ONLINE Message-ID: <20160616193504.GB3262@mtj.duckdns.org> References: <6b3c7059ec5d2d6157d23d619e4507692a42a5bd.1465311052.git.ego@linux.vnet.ibm.com> <20160615155350.GB24102@mtj.duckdns.org> <20160615192844.GA20301@in.ibm.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20160615192844.GA20301@in.ibm.com> User-Agent: Mutt/1.6.1 (2016-04-27) X-BeenThere: linuxppc-dev@lists.ozlabs.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Peter Zijlstra , linux-kernel@vger.kernel.org, Abdul Haleem , Aneesh Kumar , Thomas Gleixner , linuxppc-dev@lists.ozlabs.org, kernel-team@fb.com Errors-To: linuxppc-dev-bounces+patchwork-incoming=ozlabs.org@lists.ozlabs.org Sender: "Linuxppc-dev" Hello, So, the issue of the initial worker not having its affinity set correctly wasn't caused by the order of the operations. Reordering just made set_cpus_allowed tried one more time late enough so that it hides the race condition most of the time. The problem is that CPU_ONLINE callbacks are called while the cpu being onlined is online but not active and select_fallback_rq() only considers active cpus, so if a kthread gets scheduled in the meantime and it doesn't have any cpu which is active in its allowed mask, it's allowed mask gets reset to cpu_possible_mask. Would something like the following make sense? Thanks. ------ 8< ------ Subject: [PATCH] sched: allow kthreads to fallback to online && !active cpus During CPU hotplug, CPU_ONLINE callbacks are run while the CPU is online but not active. A CPU_ONLINE callback may create or bind a kthread so that its cpus_allowed mask only allows the CPU which is being brought online. The kthread may start executing before the CPU is made active and can end up in select_fallback_rq(). In such cases, the expected behavior is selecting the CPU which is coming online; however, because select_fallback_rq() only chooses from active CPUs, it determines that the task doesn't have any viable CPU in its allowed mask and ends up overriding it to cpu_possible_mask. CPU_ONLINE callbacks should be able to put kthreads on the CPU which is coming online. Update select_fallback_rq() so that it follows cpu_online() rather than cpu_active() for kthreads. Signed-off-by: Tejun Heo Reported-by: Gautham R Shenoy Tested-by: Gautham R. Shenoy --- kernel/sched/core.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 017d539..a12e3db 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -1536,7 +1536,9 @@ static int select_fallback_rq(int cpu, struct task_struct *p) for (;;) { /* Any allowed, online CPU? */ for_each_cpu(dest_cpu, tsk_cpus_allowed(p)) { - if (!cpu_active(dest_cpu)) + if (!(p->flags & PF_KTHREAD) && !cpu_active(dest_cpu)) + continue; + if (!cpu_online(dest_cpu)) continue; goto out; }