From patchwork Mon Dec 8 13:54:05 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Steven Rostedt X-Patchwork-Id: 418696 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from lists.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 83EA61400DD for ; Tue, 9 Dec 2014 01:04:27 +1100 (AEDT) Received: from ozlabs.org (ozlabs.org [103.22.144.67]) by lists.ozlabs.org (Postfix) with ESMTP id 5A7E61A0D19 for ; Tue, 9 Dec 2014 01:04:27 +1100 (AEDT) X-Original-To: linuxppc-dev@lists.ozlabs.org Delivered-To: linuxppc-dev@lists.ozlabs.org X-Greylist: delayed 571 seconds by postgrey-1.35 at bilbo; Tue, 09 Dec 2014 01:03:54 AEDT Received: from smtprelay.hostedemail.com (smtprelay0069.hostedemail.com [216.40.44.69]) by lists.ozlabs.org (Postfix) with ESMTP id 0C85C1A09DC for ; Tue, 9 Dec 2014 01:03:54 +1100 (AEDT) Received: from smtprelay.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by smtpgrave02.hostedemail.com (Postfix) with ESMTP id B34FE493F for ; Mon, 8 Dec 2014 13:54:22 +0000 (UTC) Received: from filter.hostedemail.com (unknown [216.40.38.60]) by smtprelay06.hostedemail.com (Postfix) with ESMTP id BEE779ED5F; Mon, 8 Dec 2014 13:54:17 +0000 (UTC) X-Session-Marker: 6E657665747340676F6F646D69732E6F7267 X-Spam-Summary: 2, 0, 0, , d41d8cd98f00b204, rostedt@goodmis.org, :::::::::::::::::::::::::::::::::::::::, RULES_HIT:41:355:379:541:599:800:960:973:988:989:1260:1277:1311:1313:1314:1345:1359:1431:1437:1515:1516:1518:1534:1542:1593:1594:1711:1730:1747:1777:1792:2194:2199:2393:2553:2559:2562:2741:3138:3139:3140:3141:3142:3353:3622:3865:3867:3868:3870:3871:3872:3874:4250:4321:4470:5007:6119:6261:6742:7875:7903:9010:9038:10004:10400:10471:10848:10967:11026:11232:11473:11658:11914:12043:12296:12438:12517:12519:12555:12679:12740:13255:13972:14096:14097:21067:21080, 0, RBL:none, CacheIP:none, Bayesian:0.5, 0.5, 0.5, Netcheck:none, DomainCache:0, MSF:not bulk, SPF:fn, MSBL:0, DNSBL:none, Custom_rules:0:0:0 X-HE-Tag: boys54_7c854197b8b43 X-Filterd-Recvd-Size: 3460 Received: from gandalf.local.home (cpe-67-246-153-56.stny.res.rr.com [67.246.153.56]) (Authenticated sender: nevets@goodmis.org) by omf05.hostedemail.com (Postfix) with ESMTPA; Mon, 8 Dec 2014 13:54:15 +0000 (UTC) Date: Mon, 8 Dec 2014 08:54:05 -0500 From: Steven Rostedt To: Anton Blanchard Subject: Re: [PATCH] kthread: kthread_bind fails to enforce CPU affinity (fixes kernel BUG at kernel/smpboot.c:134!) Message-ID: <20141208085405.730577a3@gandalf.local.home> In-Reply-To: <1418009221-12719-1-git-send-email-anton@samba.org> References: <1418009221-12719-1-git-send-email-anton@samba.org> X-Mailer: Claws Mail 3.11.1 (GTK+ 2.24.25; x86_64-pc-linux-gnu) MIME-Version: 1.0 Cc: yuyang.du@intel.com, computersforpeace@gmail.com, peterz@infradead.org, lkp@01.org, rafael.j.wysocki@intel.com, yuanhan.liu@linux.intel.com, linux-kernel@vger.kernel.org, bsegall@google.com, linuxppc-dev@lists.ozlabs.org, mingo@redhat.com, sp@datera.io, daniel@numascale.com, tj@kernel.org, subbaram@codeaurora.org, akpm@linux-foundation.org, fengguang.wu@intel.com, torvalds@linux-foundation.org, tglx@linutronix.de, pjt@google.com X-BeenThere: linuxppc-dev@lists.ozlabs.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: linuxppc-dev-bounces+patchwork-incoming=ozlabs.org@lists.ozlabs.org Sender: "Linuxppc-dev" On Mon, 8 Dec 2014 14:27:01 +1100 Anton Blanchard wrote: > I have a busy ppc64le KVM box where guests sometimes hit the infamous > "kernel BUG at kernel/smpboot.c:134!" issue during boot: > > BUG_ON(td->cpu != smp_processor_id()); > > Basically a per CPU hotplug thread scheduled on the wrong CPU. The oops > output confirms it: > > CPU: 0 > Comm: watchdog/130 > > The issue is in kthread_bind where we set the cpus_allowed mask, but do > not touch task_thread_info(p)->cpu. The scheduler assumes the previously > scheduled CPU is in the cpus_allowed mask, but in this case we are > moving a thread to another CPU so it is not. > Does this happen always on boot up, and always with the watchdog thread? I followed the logic that starts the watchdog threads. watchdog_enable_all_cpus() smpboot_register_percpu-thread() { for_each_online_cpu(cpu) { ... } Where watchdog_enable_all_cpus() can be called by lockup_detector_init() before SMP is started, but also by proc_dowatchdog() which is called by the sysctl commands (after SMP is up and running). I noticed there's no "get_online_cpus()" anywhere, although the unregister_percpu_thread() has it. Is it possible that we created a thread on a CPU that wasn't fully online yet? Perhaps the following patch is needed? Even if this isn't the solution to this bug, it is probably needed as watchdog_enable_all_cpus() can be called after boot up too. -- Steve diff --git a/kernel/smpboot.c b/kernel/smpboot.c index eb89e1807408..60d35ac5d3f1 100644 --- a/kernel/smpboot.c +++ b/kernel/smpboot.c @@ -279,6 +279,7 @@ int smpboot_register_percpu_thread(struct smp_hotplug_thread *plug_thread) unsigned int cpu; int ret = 0; + get_online_cpus(); mutex_lock(&smpboot_threads_lock); for_each_online_cpu(cpu) { ret = __smpboot_create_thread(plug_thread, cpu); @@ -291,6 +292,7 @@ int smpboot_register_percpu_thread(struct smp_hotplug_thread *plug_thread) list_add(&plug_thread->list, &hotplug_threads); out: mutex_unlock(&smpboot_threads_lock); + put_online_cpus(); return ret; } EXPORT_SYMBOL_GPL(smpboot_register_percpu_thread);