From patchwork Mon Apr 22 18:44:09 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nathan Fontenot X-Patchwork-Id: 238623 X-Patchwork-Delegate: benh@kernel.crashing.org Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from ozlabs.org (localhost [IPv6:::1]) by ozlabs.org (Postfix) with ESMTP id C5E152C0726 for ; Tue, 23 Apr 2013 04:44:55 +1000 (EST) Received: from e9.ny.us.ibm.com (e9.ny.us.ibm.com [32.97.182.139]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client CN "e9.ny.us.ibm.com", Issuer "GeoTrust SSL CA" (not verified)) by ozlabs.org (Postfix) with ESMTPS id E900B2C00E6 for ; Tue, 23 Apr 2013 04:44:26 +1000 (EST) Received: from /spool/local by e9.ny.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Mon, 22 Apr 2013 14:44:24 -0400 Received: from d01dlp03.pok.ibm.com (9.56.250.168) by e9.ny.us.ibm.com (192.168.1.109) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Mon, 22 Apr 2013 14:44:22 -0400 Received: from d01relay04.pok.ibm.com (d01relay04.pok.ibm.com [9.56.227.236]) by d01dlp03.pok.ibm.com (Postfix) with ESMTP id 76A51C90025 for ; Mon, 22 Apr 2013 14:44:21 -0400 (EDT) Received: from d01av03.pok.ibm.com (d01av03.pok.ibm.com [9.56.224.217]) by d01relay04.pok.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id r3MIiK4R320866 for ; Mon, 22 Apr 2013 14:44:21 -0400 Received: from d01av03.pok.ibm.com (loopback [127.0.0.1]) by d01av03.pok.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id r3MIiAMP032586 for ; Mon, 22 Apr 2013 15:44:10 -0300 Received: from [9.41.105.123] ([9.41.105.123]) by d01av03.pok.ibm.com (8.14.4/8.13.1/NCO v10.0 AVin) with ESMTP id r3MIiA9s032525 for ; Mon, 22 Apr 2013 15:44:10 -0300 Message-ID: <517584F9.7080102@linux.vnet.ibm.com> Date: Mon, 22 Apr 2013 13:44:09 -0500 From: Nathan Fontenot User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2.28) Gecko/20120313 Thunderbird/3.1.20 MIME-Version: 1.0 To: linuxppc-dev@lists.ozlabs.org Subject: [PATCH v3 8/12] Use stop machine to update cpu maps References: <51757951.2080007@linux.vnet.ibm.com> In-Reply-To: <51757951.2080007@linux.vnet.ibm.com> X-TM-AS-MML: No X-Content-Scanned: Fidelis XPS MAILER x-cbid: 13042218-7182-0000-0000-00000657FDFD X-BeenThere: linuxppc-dev@lists.ozlabs.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: linuxppc-dev-bounces+patchwork-incoming=ozlabs.org@lists.ozlabs.org Sender: "Linuxppc-dev" The new PRRN firmware feature allows CPU and memory resources to be transparently reassigned across NUMA boundaries. When this happens, the kernel must update the node maps to reflect the new affinity information. Although the NUMA maps can be protected by locking primitives during the update itself, this is insufficient to prevent concurrent accesses to these structures. Since cpumask_of_node() hands out a pointer to these structures, they can still be modified outside of the lock. Furthermore, tracking down each usage of these pointers and adding locks would be quite invasive and difficult to maintain. The approach used is to make a list of affected cpus and call stop_machine to have the update routine run on each of the affected cpus allowing them to update themselves. Each cpu finds itself in the list of cpus and makes the appropriate updates. We need to have each cpu do this for themselves to handle calls to vdso_getcpu_init that is added in a subsequent patch. Situations like these are best handled using stop_machine(). Since the NUMA affinity updates are exceptionally rare events, this approach has the benefit of not adding any overhead while accessing the NUMA maps during normal operation. Signed-off-by: Nathan Fontenot --- arch/powerpc/mm/numa.c | 82 ++++++++++++++++++++++++++++++++++++++----------- 1 file changed, 64 insertions(+), 18 deletions(-) Index: powerpc/arch/powerpc/mm/numa.c =================================================================== --- powerpc.orig/arch/powerpc/mm/numa.c 2013-04-17 14:04:12.000000000 -0500 +++ powerpc/arch/powerpc/mm/numa.c 2013-04-18 09:10:11.000000000 -0500 @@ -22,6 +22,7 @@ #include #include #include +#include #include #include #include @@ -1254,6 +1255,13 @@ /* Virtual Processor Home Node (VPHN) support */ #ifdef CONFIG_PPC_SPLPAR +struct topology_update_data { + struct topology_update_data *next; + unsigned int cpu; + int old_nid; + int new_nid; +}; + static u8 vphn_cpu_change_counts[NR_CPUS][MAX_DISTANCE_REF_POINTS]; static cpumask_t cpu_associativity_changes_mask; static int vphn_enabled; @@ -1405,41 +1413,79 @@ } /* + * Update the CPU maps and sysfs entries for a single CPU when its NUMA + * characteristics change. This function doesn't perform any locking and is + * only safe to call from stop_machine(). + */ +static int update_cpu_topology(void *data) +{ + struct topology_update_data *update; + unsigned long cpu; + + if (!data) + return -EINVAL; + + cpu = get_cpu(); + + for (update = data; update; update = update->next) { + if (cpu != update->cpu) + continue; + + unregister_cpu_under_node(update->cpu, update->old_nid); + unmap_cpu_from_node(update->cpu); + map_cpu_to_node(update->cpu, update->new_nid); + register_cpu_under_node(update->cpu, update->new_nid); + } + + return 0; +} + +/* * Update the node maps and sysfs entries for each cpu whose home node * has changed. Returns 1 when the topology has changed, and 0 otherwise. */ int arch_update_cpu_topology(void) { - int cpu, nid, old_nid, changed = 0; + unsigned int cpu, changed = 0; + struct topology_update_data *updates, *ud; unsigned int associativity[VPHN_ASSOC_BUFSIZE] = {0}; struct device *dev; + int weight, i = 0; + + weight = cpumask_weight(&cpu_associativity_changes_mask); + if (!weight) + return 0; + + updates = kzalloc(weight * (sizeof(*updates)), GFP_KERNEL); + if (!updates) + return 0; for_each_cpu(cpu, &cpu_associativity_changes_mask) { + ud = &updates[i++]; + ud->cpu = cpu; vphn_get_associativity(cpu, associativity); - nid = associativity_to_nid(associativity); + ud->new_nid = associativity_to_nid(associativity); - if (nid < 0 || !node_online(nid)) - nid = first_online_node; + if (ud->new_nid < 0 || !node_online(ud->new_nid)) + ud->new_nid = first_online_node; - old_nid = numa_cpu_lookup_table[cpu]; + ud->old_nid = numa_cpu_lookup_table[cpu]; - /* Disable hotplug while we update the cpu - * masks and sysfs. - */ - get_online_cpus(); - unregister_cpu_under_node(cpu, old_nid); - unmap_cpu_from_node(cpu); - map_cpu_to_node(cpu, nid); - register_cpu_under_node(cpu, nid); - put_online_cpus(); + if (i < weight) + ud->next = &updates[i]; + } + + stop_machine(update_cpu_topology, &updates[0], cpu_online_mask); - dev = get_cpu_device(cpu); + for (ud = &updates[0]; ud; ud = ud->next) { + dev = get_cpu_device(ud->cpu); if (dev) kobject_uevent(&dev->kobj, KOBJ_CHANGE); - cpumask_clear_cpu(cpu, &cpu_associativity_changes_mask); + cpumask_clear_cpu(ud->cpu, &cpu_associativity_changes_mask); changed = 1; } + kfree(updates); return changed; } @@ -1488,10 +1534,10 @@ int rc = NOTIFY_DONE; switch (action) { - case OF_RECONFIG_ADD_PROPERTY: case OF_RECONFIG_UPDATE_PROPERTY: update = (struct of_prop_reconfig *)data; - if (!of_prop_cmp(update->dn->type, "cpu")) { + if (!of_prop_cmp(update->dn->type, "cpu") && + !of_prop_cmp(update->prop->name, "ibm,associativity")) { u32 core_id; of_property_read_u32(update->dn, "reg", &core_id); stage_topology_update(core_id);