Message ID | d8b67f9d-77d0-2be3-15bc-c7f11bc519f7@linux.vnet.ibm.com (mailing list archive) |
---|---|
State | Rejected |
Headers | show |
Series | [v04] powerpc/numa: Perform full re-add of CPU for PRRN/VPHN topology update | expand |
Context | Check | Description |
---|---|---|
snowpatch_ozlabs/apply_patch | success | next/apply_patch Successfully applied |
snowpatch_ozlabs/build-ppc64le | fail | build failed! |
snowpatch_ozlabs/build-ppc64be | fail | build failed! |
snowpatch_ozlabs/build-ppc64e | success | build succeeded & removed 0 sparse warning(s) |
snowpatch_ozlabs/build-pmac32 | success | build succeeded & removed 0 sparse warning(s) |
snowpatch_ozlabs/checkpatch | fail | total: 1 errors, 2 warnings, 6 checks, 160 lines checked |
On Thu, 14 Feb 2019 09:56:26 -0600 Michael Bringmann <mwb@linux.vnet.ibm.com> wrote: Hello, > To: linuxppc-dev@lists.ozlabs.org > To: linux-kernel@vger.kernel.org > Benjamin Herrenschmidt <benh@kernel.crashing.org> > Paul Mackerras <paulus@samba.org> > Michael Ellerman <mpe@ellerman.id.au> > Nathan Lynch <nathanl@linux.vnet.ibm.com> > Corentin Labbe <clabbe@baylibre.com> > Tyrel Datwyler <tyreld@linux.vnet.ibm.com> > Srikar Dronamraju <srikar@linux.vnet.ibm.com> > Guenter Roeck <linux@roeck-us.net> > Michael Bringmann <mwb@linux.vnet.ibm.com> > "Oliver O'Halloran" <oohall@gmail.com> > Russell Currey <ruscur@russell.cc> > Haren Myneni <haren@us.ibm.com> > Al Viro <viro@zeniv.linux.org.uk> > Kees Cook <keescook@chromium.org> > Nicholas Piggin <npiggin@gmail.com> > Rob Herring <robh@kernel.org> > Juliet Kim <minkim@us.ibm.com> > Thomas Falcon <tlfalcon@linux.vnet.ibm.com> > Date: 2018-11-05 16:14:12 -0600 > Subject: [PATCH v04] powerpc/numa: Perform full re-add of CPU for PRRN/VPHN topology update Looks like something went wrong with the e-mail headers. Maybe you should re-send? Thanks Michal > > On pseries systems, performing changes to a partition's affinity > can result in altering the nodes a CPU is assigned to the > current system. For example, some systems are subject to resource > balancing operations by the operator or control software. In such > environments, system CPUs may be in node 1 and 3 at boot, and be > moved to nodes 2, 3, and 5, for better performance. > > The current implementation attempts to recognize such changes within > the powerpc-specific version of arch_update_cpu_topology to modify a > range of system data structures directly. However, some scheduler > data structures may be inaccessible, or the timing of a node change > may still lead to corruption or error in other modules (e.g. user > space) which do not receive notification of these changes. > > This patch modifies the PRRN/VPHN topology update worker function to > recognize an affinity change for a CPU, and to perform a full DLPAR > remove and add of the CPU instead of dynamically changing its node > to resolve this issue. > > [Based upon patch submission: > Subject: [PATCH] powerpc/pseries: Perform full re-add of CPU for topology update post-migration > From: Nathan Fontenot <nfont@linux.vnet.ibm.com> > Date: Tue Oct 30 05:43:36 AEDT 2018 > ] > > [Replace patch submission: > Subject: [PATCH] powerpc/topology: Update numa mask when cpu node mapping changes > From: Srikar Dronamraju <srikar@linux.vnet.ibm.com> > Date: Wed Oct 10 15:24:46 AEDT 2018 > ] > > Signed-off-by: Michael Bringmann <mwb@linux.vnet.ibm.com> > --- > Changes in v04: > -- Revise tests in topology_timer_fn to check vphn_enabled before prrn_enabled > -- Remove unnecessary changes to numa_update_cpu_topology > Changes in v03: > -- Fixed under-scheduling of topo updates. > Changes in v02: > -- Reuse more of the previous implementation to reduce patch size > -- Replace former calls to numa_update_cpu_topology(false) by > topology_schedule_update > -- Make sure that we report topology changes back through > arch_update_cpu_topology > -- Fix problem observed in powerpc next kernel with updating > cpu_associativity_changes_mask in timer_topology_fn when both > prrn_enabled and vphn_enabled, and many extra CPUs are possible, > but not installed. > -- Fix problem with updating cpu_associativity_changes_mask when > VPHN associativity information does not arrive until after first > call to update topology occurs. > --- > arch/powerpc/include/asm/topology.h | 7 +---- > arch/powerpc/kernel/rtasd.c | 2 + > arch/powerpc/mm/numa.c | 47 +++++++++++++++++++++++------------ > 3 files changed, 34 insertions(+), 22 deletions(-) > > diff --git a/arch/powerpc/include/asm/topology.h b/arch/powerpc/include/asm/topology.h > index f85e2b01c3df..79505c371fd5 100644 > --- a/arch/powerpc/include/asm/topology.h > +++ b/arch/powerpc/include/asm/topology.h > @@ -42,7 +42,7 @@ extern void __init dump_numa_cpu_topology(void); > > extern int sysfs_add_device_to_node(struct device *dev, int nid); > extern void sysfs_remove_device_from_node(struct device *dev, int nid); > -extern int numa_update_cpu_topology(bool cpus_locked); > +extern void topology_schedule_update(void); > > static inline void update_numa_cpu_lookup_table(unsigned int cpu, int node) > { > @@ -77,10 +77,7 @@ static inline void sysfs_remove_device_from_node(struct device *dev, > { > } > > -static inline int numa_update_cpu_topology(bool cpus_locked) > -{ > - return 0; > -} > +static inline void topology_schedule_update(void) {} > > static inline void update_numa_cpu_lookup_table(unsigned int cpu, int node) {} > > diff --git a/arch/powerpc/kernel/rtasd.c b/arch/powerpc/kernel/rtasd.c > index 8a1746d755c9..b1828de7ab78 100644 > --- a/arch/powerpc/kernel/rtasd.c > +++ b/arch/powerpc/kernel/rtasd.c > @@ -285,7 +285,7 @@ static void handle_prrn_event(s32 scope) > * the RTAS event. > */ > pseries_devicetree_update(-scope); > - numa_update_cpu_topology(false); > + topology_schedule_update(); > } > > static void handle_rtas_event(const struct rtas_error_log *log) > diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c > index b5d1c45c1475..eb63479f09d7 100644 > --- a/arch/powerpc/mm/numa.c > +++ b/arch/powerpc/mm/numa.c > @@ -1077,6 +1077,8 @@ static int prrn_enabled; > static void reset_topology_timer(void); > static int topology_timer_secs = 1; > static int topology_inited; > +static int topology_update_in_progress; > +static int topology_changed; > > /* > * Change polling interval for associativity changes. > @@ -1297,9 +1299,9 @@ static int update_lookup_table(void *data) > * Update the node maps and sysfs entries for each cpu whose home node > * has changed. Returns 1 when the topology has changed, and 0 otherwise. > * > - * cpus_locked says whether we already hold cpu_hotplug_lock. > + * readd_cpus: Also readd any CPUs that have changed affinity > */ > -int numa_update_cpu_topology(bool cpus_locked) > +static int numa_update_cpu_topology(bool readd_cpus) > { > unsigned int cpu, sibling, changed = 0; > struct topology_update_data *updates, *ud; > @@ -1307,7 +1309,8 @@ int numa_update_cpu_topology(bool cpus_locked) > struct device *dev; > int weight, new_nid, i = 0; > > - if (!prrn_enabled && !vphn_enabled && topology_inited) > + if ((!prrn_enabled && !vphn_enabled && topology_inited) || > + topology_update_in_progress) > return 0; > > weight = cpumask_weight(&cpu_associativity_changes_mask); > @@ -1318,6 +1321,8 @@ int numa_update_cpu_topology(bool cpus_locked) > if (!updates) > return 0; > > + topology_update_in_progress = 1; > + > cpumask_clear(&updated_cpus); > > for_each_cpu(cpu, &cpu_associativity_changes_mask) { > @@ -1339,16 +1344,21 @@ int numa_update_cpu_topology(bool cpus_locked) > > new_nid = find_and_online_cpu_nid(cpu); > > - if (new_nid == numa_cpu_lookup_table[cpu]) { > + if ((new_nid == numa_cpu_lookup_table[cpu]) || > + !cpu_present(cpu)) { > cpumask_andnot(&cpu_associativity_changes_mask, > &cpu_associativity_changes_mask, > cpu_sibling_mask(cpu)); > - dbg("Assoc chg gives same node %d for cpu%d\n", > + if (cpu_present(cpu)) > + dbg("Assoc chg gives same node %d for cpu%d\n", > new_nid, cpu); > cpu = cpu_last_thread_sibling(cpu); > continue; > } > > + if (readd_cpus) > + dlpar_cpu_readd(cpu); > + > for_each_cpu(sibling, cpu_sibling_mask(cpu)) { > ud = &updates[i++]; > ud->next = &updates[i]; > @@ -1390,7 +1400,7 @@ int numa_update_cpu_topology(bool cpus_locked) > if (!cpumask_weight(&updated_cpus)) > goto out; > > - if (cpus_locked) > + if (!readd_cpus) > stop_machine_cpuslocked(update_cpu_topology, &updates[0], > &updated_cpus); > else > @@ -1401,9 +1411,9 @@ int numa_update_cpu_topology(bool cpus_locked) > * offline CPUs. It is best to perform this update from the stop- > * machine context. > */ > - if (cpus_locked) > + if (!readd_cpus) > stop_machine_cpuslocked(update_lookup_table, &updates[0], > - cpumask_of(raw_smp_processor_id())); > + cpumask_of(raw_smp_processor_id())); > else > stop_machine(update_lookup_table, &updates[0], > cpumask_of(raw_smp_processor_id())); > @@ -1420,35 +1430,40 @@ int numa_update_cpu_topology(bool cpus_locked) > } > > out: > + topology_update_in_progress = 0; > kfree(updates); > return changed; > } > > int arch_update_cpu_topology(void) > { > - return numa_update_cpu_topology(true); > + return numa_update_cpu_topology(false); > } > > static void topology_work_fn(struct work_struct *work) > { > - rebuild_sched_domains(); > + lock_device_hotplug(); > + if (numa_update_cpu_topology(true)) > + rebuild_sched_domains(); > + unlock_device_hotplug(); > } > static DECLARE_WORK(topology_work, topology_work_fn); > > -static void topology_schedule_update(void) > +void topology_schedule_update(void) > { > - schedule_work(&topology_work); > + if (!topology_update_in_progress) > + schedule_work(&topology_work); > } > > static void topology_timer_fn(struct timer_list *unused) > { > - if (prrn_enabled && cpumask_weight(&cpu_associativity_changes_mask)) > - topology_schedule_update(); > - else if (vphn_enabled) { > + if (vphn_enabled) { > if (update_cpu_associativity_changes_mask() > 0) > topology_schedule_update(); > reset_topology_timer(); > } > + else if (prrn_enabled && cpumask_weight(&cpu_associativity_changes_mask)) > + topology_schedule_update(); > } > static struct timer_list topology_timer; > > @@ -1553,7 +1568,7 @@ void __init shared_proc_topology_init(void) > if (lppaca_shared_proc(get_lppaca())) { > bitmap_fill(cpumask_bits(&cpu_associativity_changes_mask), > nr_cpumask_bits); > - numa_update_cpu_topology(false); > + topology_schedule_update(); > } > } > >
On Mon, 18 Feb 2019 11:49:17 +0100 Michal Suchánek <msuchanek@suse.de> wrote: Nevermind Looks like some version of the patch is queued in powerpc/next already. Thanks Michal
On 2/18/19 8:15 AM, Michal Suchánek wrote: > On Mon, 18 Feb 2019 11:49:17 +0100 > Michal Suchánek <msuchanek@suse.de> wrote: > > Nevermind > > Looks like some version of the patch is queued in powerpc/next already. Might you be referring to, [PATCH] powerpc/pseries: Perform full re-add of CPU for topology update post-migration aka 81b6132 powerpc/pseries: Perform full re-add of CPU for topology update post-mig in the powerpc-next tree? That is for the case of device-tree changes observed after a migration. This patch builds upon it for CPU affinity changes observed via PRRN/VPHN events. > > Thanks > > Michal Thanks.
diff --git a/arch/powerpc/include/asm/topology.h b/arch/powerpc/include/asm/topology.h index f85e2b01c3df..79505c371fd5 100644 --- a/arch/powerpc/include/asm/topology.h +++ b/arch/powerpc/include/asm/topology.h @@ -42,7 +42,7 @@ extern void __init dump_numa_cpu_topology(void); extern int sysfs_add_device_to_node(struct device *dev, int nid); extern void sysfs_remove_device_from_node(struct device *dev, int nid); -extern int numa_update_cpu_topology(bool cpus_locked); +extern void topology_schedule_update(void); static inline void update_numa_cpu_lookup_table(unsigned int cpu, int node) { @@ -77,10 +77,7 @@ static inline void sysfs_remove_device_from_node(struct device *dev, { } -static inline int numa_update_cpu_topology(bool cpus_locked) -{ - return 0; -} +static inline void topology_schedule_update(void) {} static inline void update_numa_cpu_lookup_table(unsigned int cpu, int node) {} diff --git a/arch/powerpc/kernel/rtasd.c b/arch/powerpc/kernel/rtasd.c index 8a1746d755c9..b1828de7ab78 100644 --- a/arch/powerpc/kernel/rtasd.c +++ b/arch/powerpc/kernel/rtasd.c @@ -285,7 +285,7 @@ static void handle_prrn_event(s32 scope) * the RTAS event. */ pseries_devicetree_update(-scope); - numa_update_cpu_topology(false); + topology_schedule_update(); } static void handle_rtas_event(const struct rtas_error_log *log) diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c index b5d1c45c1475..eb63479f09d7 100644 --- a/arch/powerpc/mm/numa.c +++ b/arch/powerpc/mm/numa.c @@ -1077,6 +1077,8 @@ static int prrn_enabled; static void reset_topology_timer(void); static int topology_timer_secs = 1; static int topology_inited; +static int topology_update_in_progress; +static int topology_changed; /* * Change polling interval for associativity changes. @@ -1297,9 +1299,9 @@ static int update_lookup_table(void *data) * Update the node maps and sysfs entries for each cpu whose home node * has changed. Returns 1 when the topology has changed, and 0 otherwise. * - * cpus_locked says whether we already hold cpu_hotplug_lock. + * readd_cpus: Also readd any CPUs that have changed affinity */ -int numa_update_cpu_topology(bool cpus_locked) +static int numa_update_cpu_topology(bool readd_cpus) { unsigned int cpu, sibling, changed = 0; struct topology_update_data *updates, *ud; @@ -1307,7 +1309,8 @@ int numa_update_cpu_topology(bool cpus_locked) struct device *dev; int weight, new_nid, i = 0; - if (!prrn_enabled && !vphn_enabled && topology_inited) + if ((!prrn_enabled && !vphn_enabled && topology_inited) || + topology_update_in_progress) return 0; weight = cpumask_weight(&cpu_associativity_changes_mask); @@ -1318,6 +1321,8 @@ int numa_update_cpu_topology(bool cpus_locked) if (!updates) return 0; + topology_update_in_progress = 1; + cpumask_clear(&updated_cpus); for_each_cpu(cpu, &cpu_associativity_changes_mask) { @@ -1339,16 +1344,21 @@ int numa_update_cpu_topology(bool cpus_locked) new_nid = find_and_online_cpu_nid(cpu); - if (new_nid == numa_cpu_lookup_table[cpu]) { + if ((new_nid == numa_cpu_lookup_table[cpu]) || + !cpu_present(cpu)) { cpumask_andnot(&cpu_associativity_changes_mask, &cpu_associativity_changes_mask, cpu_sibling_mask(cpu)); - dbg("Assoc chg gives same node %d for cpu%d\n", + if (cpu_present(cpu)) + dbg("Assoc chg gives same node %d for cpu%d\n", new_nid, cpu); cpu = cpu_last_thread_sibling(cpu); continue; } + if (readd_cpus) + dlpar_cpu_readd(cpu); + for_each_cpu(sibling, cpu_sibling_mask(cpu)) { ud = &updates[i++]; ud->next = &updates[i]; @@ -1390,7 +1400,7 @@ int numa_update_cpu_topology(bool cpus_locked) if (!cpumask_weight(&updated_cpus)) goto out; - if (cpus_locked) + if (!readd_cpus) stop_machine_cpuslocked(update_cpu_topology, &updates[0], &updated_cpus); else @@ -1401,9 +1411,9 @@ int numa_update_cpu_topology(bool cpus_locked) * offline CPUs. It is best to perform this update from the stop- * machine context. */ - if (cpus_locked) + if (!readd_cpus) stop_machine_cpuslocked(update_lookup_table, &updates[0], - cpumask_of(raw_smp_processor_id())); + cpumask_of(raw_smp_processor_id())); else stop_machine(update_lookup_table, &updates[0], cpumask_of(raw_smp_processor_id())); @@ -1420,35 +1430,40 @@ int numa_update_cpu_topology(bool cpus_locked) } out: + topology_update_in_progress = 0; kfree(updates); return changed; } int arch_update_cpu_topology(void) { - return numa_update_cpu_topology(true); + return numa_update_cpu_topology(false); } static void topology_work_fn(struct work_struct *work) { - rebuild_sched_domains(); + lock_device_hotplug(); + if (numa_update_cpu_topology(true)) + rebuild_sched_domains(); + unlock_device_hotplug(); } static DECLARE_WORK(topology_work, topology_work_fn); -static void topology_schedule_update(void) +void topology_schedule_update(void) { - schedule_work(&topology_work); + if (!topology_update_in_progress) + schedule_work(&topology_work); } static void topology_timer_fn(struct timer_list *unused) { - if (prrn_enabled && cpumask_weight(&cpu_associativity_changes_mask)) - topology_schedule_update(); - else if (vphn_enabled) { + if (vphn_enabled) { if (update_cpu_associativity_changes_mask() > 0) topology_schedule_update(); reset_topology_timer(); } + else if (prrn_enabled && cpumask_weight(&cpu_associativity_changes_mask)) + topology_schedule_update(); } static struct timer_list topology_timer; @@ -1553,7 +1568,7 @@ void __init shared_proc_topology_init(void) if (lppaca_shared_proc(get_lppaca())) { bitmap_fill(cpumask_bits(&cpu_associativity_changes_mask), nr_cpumask_bits); - numa_update_cpu_topology(false); + topology_schedule_update(); } }