From patchwork Tue Apr 8 03:19:36 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Michael Wang X-Patchwork-Id: 337566 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from ozlabs.org (localhost [IPv6:::1]) by ozlabs.org (Postfix) with ESMTP id 0D7BF14012C for ; Tue, 8 Apr 2014 13:20:19 +1000 (EST) Received: from e28smtp05.in.ibm.com (e28smtp05.in.ibm.com [122.248.162.5]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 52013140113 for ; Tue, 8 Apr 2014 13:19:45 +1000 (EST) Received: from /spool/local by e28smtp05.in.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Tue, 8 Apr 2014 08:49:42 +0530 Received: from d28dlp03.in.ibm.com (9.184.220.128) by e28smtp05.in.ibm.com (192.168.1.135) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Tue, 8 Apr 2014 08:49:41 +0530 Received: from d28relay04.in.ibm.com (d28relay04.in.ibm.com [9.184.220.61]) by d28dlp03.in.ibm.com (Postfix) with ESMTP id 32B591258044 for ; Tue, 8 Apr 2014 08:52:14 +0530 (IST) Received: from d28av04.in.ibm.com (d28av04.in.ibm.com [9.184.220.66]) by d28relay04.in.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id s383JjFN8192498 for ; Tue, 8 Apr 2014 08:49:45 +0530 Received: from d28av04.in.ibm.com (localhost [127.0.0.1]) by d28av04.in.ibm.com (8.14.4/8.14.4/NCO v10.0 AVout) with ESMTP id s383JdFc008311 for ; Tue, 8 Apr 2014 08:49:40 +0530 Received: from [9.111.17.197] (wangyun-thinkpad-t420.cn.ibm.com [9.111.17.197]) by d28av04.in.ibm.com (8.14.4/8.14.4/NCO v10.0 AVin) with ESMTP id s383Jb6e008084; Tue, 8 Apr 2014 08:49:37 +0530 Message-ID: <53436AC8.5020705@linux.vnet.ibm.com> Date: Tue, 08 Apr 2014 11:19:36 +0800 From: Michael wang User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.4.0 MIME-Version: 1.0 To: linuxppc-dev@lists.ozlabs.org, LKML , benh@kernel.crashing.org, paulus@samba.org, nfont@linux.vnet.ibm.com, sfr@canb.auug.org.au, Andrew Morton , rcj@linux.vnet.ibm.com, jlarrew@linux.vnet.ibm.com, srivatsa.bhat@linux.vnet.ibm.com, alistair@popple.id.au Subject: [PATCH v2] power, sched: stop updating inside arch_update_cpu_topology() when nothing to be update References: <533B8431.8090507@linux.vnet.ibm.com> In-Reply-To: <533B8431.8090507@linux.vnet.ibm.com> X-TM-AS-MML: disable X-Content-Scanned: Fidelis XPS MAILER x-cbid: 14040803-8256-0000-0000-00000C66D198 X-BeenThere: linuxppc-dev@lists.ozlabs.org X-Mailman-Version: 2.1.16 Precedence: list List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: linuxppc-dev-bounces+patchwork-incoming=ozlabs.org@lists.ozlabs.org Sender: "Linuxppc-dev" Since v1: Edited the comment according to Srivatsa's suggestion. During the testing, we encounter below WARN followed by Oops: WARNING: at kernel/sched/core.c:6218 ... NIP [c000000000101660] .build_sched_domains+0x11d0/0x1200 LR [c000000000101358] .build_sched_domains+0xec8/0x1200 PACATMSCRATCH [800000000000f032] Call Trace: [c00000001b103850] [c000000000101358] .build_sched_domains+0xec8/0x1200 [c00000001b1039a0] [c00000000010aad4] .partition_sched_domains+0x484/0x510 [c00000001b103aa0] [c00000000016d0a8] .rebuild_sched_domains+0x68/0xa0 [c00000001b103b30] [c00000000005cbf0] .topology_work_fn+0x10/0x30 ... Oops: Kernel access of bad area, sig: 11 [#1] ... NIP [c00000000045c000] .__bitmap_weight+0x60/0xf0 LR [c00000000010132c] .build_sched_domains+0xe9c/0x1200 PACATMSCRATCH [8000000000029032] Call Trace: [c00000001b1037a0] [c000000000288ff4] .kmem_cache_alloc_node_trace+0x184/0x3a0 [c00000001b103850] [c00000000010132c] .build_sched_domains+0xe9c/0x1200 [c00000001b1039a0] [c00000000010aad4] .partition_sched_domains+0x484/0x510 [c00000001b103aa0] [c00000000016d0a8] .rebuild_sched_domains+0x68/0xa0 [c00000001b103b30] [c00000000005cbf0] .topology_work_fn+0x10/0x30 ... This was caused by that 'sd->groups == NULL' after building groups, which was caused by the empty 'sd->span'. The cpu's domain contained nothing because the cpu was assigned to a wrong node, due to the following unfortunate sequence of events: 1. The hypervisor sent a topology update to the guest OS, to notify changes to the cpu-node mapping. However, the update was actually redundant - i.e., the "new" mapping was exactly the same as the old one. 2. Due to this, the 'updated_cpus' mask turned out to be empty after exiting the 'for-loop' in arch_update_cpu_topology(). 3. So we ended up calling stop-machine() with an empty cpumask list, which made stop-machine internally elect cpumask_first(cpu_online_mask), i.e., CPU0 as the cpu to run the payload (the update_cpu_topology() function). 4. This causes update_cpu_topology() to be run by CPU0. And since 'updates' is kzalloc()'ed inside arch_update_cpu_topology(), update_cpu_topology() finds update->cpu as well as update->new_nid to be 0. In other words, we end up assigning CPU0 (and eventually its siblings) to node 0, incorrectly. Along with the following wrong updating, it causes the sched-domain rebuild code to break and crash the system. Fix this by skipping the topology update in cases where we find that the topology has not actually changed in reality (ie., spurious updates). CC: Benjamin Herrenschmidt CC: Paul Mackerras CC: Nathan Fontenot CC: Stephen Rothwell CC: Andrew Morton CC: Robert Jennings CC: Jesse Larrew CC: "Srivatsa S. Bhat" CC: Alistair Popple Suggested-by: "Srivatsa S. Bhat" Signed-off-by: Michael Wang Reviewed-by: Srivatsa S. Bhat --- arch/powerpc/mm/numa.c | 15 +++++++++++++++ 1 file changed, 15 insertions(+) diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c index 30a42e2..4ebbb9e 100644 --- a/arch/powerpc/mm/numa.c +++ b/arch/powerpc/mm/numa.c @@ -1591,6 +1591,20 @@ int arch_update_cpu_topology(void) cpu = cpu_last_thread_sibling(cpu); } + /* + * In cases where we have nothing to update (because the updates list + * is too short or because the new topology is same as the old one), + * skip invoking update_cpu_topology() via stop-machine(). This is + * necessary (and not just a fast-path optimization) since stop-machine + * can end up electing a random CPU to run update_cpu_topology(), and + * thus trick us into setting up incorrect cpu-node mappings (since + * 'updates' is kzalloc()'ed). + * + * And for the similar reason, we will skip all the following updating. + */ + if (!cpumask_weight(&updated_cpus)) + goto out; + stop_machine(update_cpu_topology, &updates[0], &updated_cpus); /* @@ -1612,6 +1626,7 @@ int arch_update_cpu_topology(void) changed = 1; } +out: kfree(updates); return changed; }