Message ID | 1493816238-33120-6-git-send-email-imammedo@redhat.com |
---|---|
State | New |
Headers | show |
On Wed, May 03, 2017 at 02:56:59PM +0200, Igor Mammedov wrote: > Originally CPU threads were by default assigned in > round-robin fashion. However it was causing issues in > guest since CPU threads from the same socket/core could > be placed on different NUMA nodes. > Commit fb43b73b (pc: fix default VCPU to NUMA node mapping) > fixed it by grouping threads within a socket on the same node > introducing cpu_index_to_socket_id() callback and commit > 20bb648d (spapr: Fix default NUMA node allocation for threads) > reused callback to fix similar issues for SPAPR machine > even though socket doesn't make much sense there. > > As result QEMU ended up having 3 default distribution rules > used by 3 targets /virt-arm, spapr, pc/. > > In effort of moving NUMA mapping for CPUs into possible_cpus, > generalize default mapping in numa.c by making boards decide > on default mapping and let them explicitly tell generic > numa code to which node a CPU thread belongs to by replacing > cpu_index_to_socket_id() with @cpu_index_to_instance_props() > which provides default node_id assigned by board to specified > cpu_index. > > Signed-off-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com> Just two extra comments below: [...] > +static CpuInstanceProperties > +virt_cpu_index_to_props(MachineState *ms, unsigned cpu_index) > +{ > + MachineClass *mc = MACHINE_GET_CLASS(ms); > + const CPUArchIdList *possible_cpus = mc->possible_cpu_arch_ids(ms); > + > + assert(cpu_index < possible_cpus->len); > + return possible_cpus->cpus[cpu_index].props;; > +} > + [...] > +static CpuInstanceProperties > +pc_cpu_index_to_props(MachineState *ms, unsigned cpu_index) > { > + MachineClass *mc = MACHINE_GET_CLASS(ms); > + const CPUArchIdList *possible_cpus = mc->possible_cpu_arch_ids(ms); > + > + assert(cpu_index < possible_cpus->len); > + return possible_cpus->cpus[cpu_index].props;; > } The fact that these two implementations look exactly the same made me wonder: 1) Why this isn't the default implementation; 2) Why exactly spapr needs a different implementation. Then I noticed that there's nothing in the common machine code that specifies that possible_cpus->cpus[] is indexed by cpu_index. This means it is indeed safer to require each machine to provide its own cpu_index_to_props implementation than having a default implementation that can unexpectedly break (e.g. if granularity at possible_cpus is not at VCPU/thread level). I would still like to have an abstraction that wouldn't require writing machine-specific code (e.g. cpu_index ranges to possible_cpus like David suggested), but that's for a follow-up series. [...] > +static CpuInstanceProperties > +spapr_cpu_index_to_props(MachineState *machine, unsigned cpu_index) > { > + CPUArchId *core_slot; > + MachineClass *mc = MACHINE_GET_CLASS(machine); > + > + /* make sure possible_cpu are intialized */ > + mc->possible_cpu_arch_ids(machine); > + core_slot = spapr_find_cpu_slot(machine, cpu_index, NULL); > + assert(core_slot); > + return core_slot->props; > } If you need to submit v3, maybe a comment here explaining why spapr needs a different cpu_index_to_props implementation would be helpful. I took a while to figure it out.
On Wed, May 03, 2017 at 02:56:59PM +0200, Igor Mammedov wrote: [...] > @@ -378,14 +379,16 @@ void parse_numa_opts(MachineClass *mc) > * rule grouping VCPUs by socket so that VCPUs from the same socket > * would be on the same node. > */ > + if (!mc->cpu_index_to_instance_props) { > + error_report("default CPUs to NUMA node mapping isn't supported"); > + exit(1); > + } This will make people trying to use -numa on unsupported machines see a misleading error message: instead of telling them that the machine doesn't support NUMA at all, the message seems to imply that NUMA may be supported and we just don't have default NUMA node mapping support. Probably a more generic "NUMA is not supported by this machine-type" message before even trying to parse -numa would be clearer. (I don't know if another patch in this series already does that.)
On Wed, May 03, 2017 at 02:56:59PM +0200, Igor Mammedov wrote: [...] > diff --git a/numa.c b/numa.c > index 6fc2393..ab1661d 100644 > --- a/numa.c > +++ b/numa.c > @@ -294,9 +294,10 @@ static void validate_numa_cpus(void) > g_free(seen_cpus); > } > > -void parse_numa_opts(MachineClass *mc) > +void parse_numa_opts(MachineState *ms) > { > int i; > + MachineClass *mc = MACHINE_GET_CLASS(ms); > > for (i = 0; i < MAX_NODES; i++) { > numa_info[i].node_cpu = bitmap_new(max_cpus); > @@ -378,14 +379,16 @@ void parse_numa_opts(MachineClass *mc) Expanding diff context: > /* Historically VCPUs were assigned in round-robin order to NUMA > * nodes. However it causes issues with guest not handling it nice > * in case where cores/threads from a multicore CPU appear on > * different nodes. So allow boards to override default distribution > * rule grouping VCPUs by socket so that VCPUs from the same socket > * would be on the same node. > */ The above comment looks obsolete, as we are removing the code inside parse_numa_opts() that deals with grouping VCPUs by socket. (Can be fixed by a follow-up patch, if necessary.) > + if (!mc->cpu_index_to_instance_props) { > + error_report("default CPUs to NUMA node mapping isn't supported"); > + exit(1); > + } > if (i == nb_numa_nodes) { > for (i = 0; i < max_cpus; i++) { > - unsigned node_id = i % nb_numa_nodes; > - if (mc->cpu_index_to_socket_id) { > - node_id = mc->cpu_index_to_socket_id(i) % nb_numa_nodes; > - } > + CpuInstanceProperties props; > + props = mc->cpu_index_to_instance_props(ms, i); > > - set_bit(i, numa_info[node_id].node_cpu); > + set_bit(i, numa_info[props.node_id].node_cpu); > } > } > [...]
On Wed, 3 May 2017 11:42:40 -0300 Eduardo Habkost <ehabkost@redhat.com> wrote: > On Wed, May 03, 2017 at 02:56:59PM +0200, Igor Mammedov wrote: > > Originally CPU threads were by default assigned in > > round-robin fashion. However it was causing issues in > > guest since CPU threads from the same socket/core could > > be placed on different NUMA nodes. > > Commit fb43b73b (pc: fix default VCPU to NUMA node mapping) > > fixed it by grouping threads within a socket on the same node > > introducing cpu_index_to_socket_id() callback and commit > > 20bb648d (spapr: Fix default NUMA node allocation for threads) > > reused callback to fix similar issues for SPAPR machine > > even though socket doesn't make much sense there. > > > > As result QEMU ended up having 3 default distribution rules > > used by 3 targets /virt-arm, spapr, pc/. > > > > In effort of moving NUMA mapping for CPUs into possible_cpus, > > generalize default mapping in numa.c by making boards decide > > on default mapping and let them explicitly tell generic > > numa code to which node a CPU thread belongs to by replacing > > cpu_index_to_socket_id() with @cpu_index_to_instance_props() > > which provides default node_id assigned by board to specified > > cpu_index. > > > > Signed-off-by: Igor Mammedov <imammedo@redhat.com> > > Reviewed-by: Eduardo Habkost <ehabkost@redhat.com> > > Just two extra comments below: > > [...] > > +static CpuInstanceProperties > > +virt_cpu_index_to_props(MachineState *ms, unsigned cpu_index) > > +{ > > + MachineClass *mc = MACHINE_GET_CLASS(ms); > > + const CPUArchIdList *possible_cpus = mc->possible_cpu_arch_ids(ms); > > + > > + assert(cpu_index < possible_cpus->len); > > + return possible_cpus->cpus[cpu_index].props;; > > +} > > + > [...] > > +static CpuInstanceProperties > > +pc_cpu_index_to_props(MachineState *ms, unsigned cpu_index) > > { > > + MachineClass *mc = MACHINE_GET_CLASS(ms); > > + const CPUArchIdList *possible_cpus = mc->possible_cpu_arch_ids(ms); > > + > > + assert(cpu_index < possible_cpus->len); > > + return possible_cpus->cpus[cpu_index].props;; > > } > > The fact that these two implementations look exactly the same > made me wonder: > > 1) Why this isn't the default implementation; > 2) Why exactly spapr needs a different implementation. > > Then I noticed that there's nothing in the common machine code > that specifies that possible_cpus->cpus[] is indexed by > cpu_index. This means it is indeed safer to require each machine > to provide its own cpu_index_to_props implementation than having > a default implementation that can unexpectedly break (e.g. if > granularity at possible_cpus is not at VCPU/thread level). > > I would still like to have an abstraction that wouldn't require > writing machine-specific code (e.g. cpu_index ranges to > possible_cpus like David suggested), but that's for a follow-up > series. I've left generalizing this part until we have defined/stable topology for aarch64. Also I hope, that instead of generalizing we might depricate/drop -numa node,cpus=0,1,... in future and just drop this hooks altogether as they are there only for compatibility with old cpu_index based mapping > [...] > > +static CpuInstanceProperties > > +spapr_cpu_index_to_props(MachineState *machine, unsigned cpu_index) > > { > > + CPUArchId *core_slot; > > + MachineClass *mc = MACHINE_GET_CLASS(machine); > > + > > + /* make sure possible_cpu are intialized */ > > + mc->possible_cpu_arch_ids(machine); > > + core_slot = spapr_find_cpu_slot(machine, cpu_index, NULL); > > + assert(core_slot); > > + return core_slot->props; > > } > > If you need to submit v3, maybe a comment here explaining why > spapr needs a different cpu_index_to_props implementation would > be helpful. I took a while to figure it out. I'll add comment if series is respinned.
On Wed, 3 May 2017 11:59:35 -0300 Eduardo Habkost <ehabkost@redhat.com> wrote: > On Wed, May 03, 2017 at 02:56:59PM +0200, Igor Mammedov wrote: > [...] > > @@ -378,14 +379,16 @@ void parse_numa_opts(MachineClass *mc) > > * rule grouping VCPUs by socket so that VCPUs from the same socket > > * would be on the same node. > > */ > > + if (!mc->cpu_index_to_instance_props) { > > + error_report("default CPUs to NUMA node mapping isn't supported"); > > + exit(1); > > + } > > This will make people trying to use -numa on unsupported machines > see a misleading error message: instead of telling them that the > machine doesn't support NUMA at all, the message seems to imply > that NUMA may be supported and we just don't have default NUMA node > mapping support. > > Probably a more generic "NUMA is not supported by this > machine-type" message before even trying to parse -numa would be > clearer. (I don't know if another patch in this series already > does that.) no, other places should error out with other error messages if it's not supported.
On Wed, May 03, 2017 at 11:42:40AM -0300, Eduardo Habkost wrote: > On Wed, May 03, 2017 at 02:56:59PM +0200, Igor Mammedov wrote: > > Originally CPU threads were by default assigned in > > round-robin fashion. However it was causing issues in > > guest since CPU threads from the same socket/core could > > be placed on different NUMA nodes. > > Commit fb43b73b (pc: fix default VCPU to NUMA node mapping) > > fixed it by grouping threads within a socket on the same node > > introducing cpu_index_to_socket_id() callback and commit > > 20bb648d (spapr: Fix default NUMA node allocation for threads) > > reused callback to fix similar issues for SPAPR machine > > even though socket doesn't make much sense there. > > > > As result QEMU ended up having 3 default distribution rules > > used by 3 targets /virt-arm, spapr, pc/. > > > > In effort of moving NUMA mapping for CPUs into possible_cpus, > > generalize default mapping in numa.c by making boards decide > > on default mapping and let them explicitly tell generic > > numa code to which node a CPU thread belongs to by replacing > > cpu_index_to_socket_id() with @cpu_index_to_instance_props() > > which provides default node_id assigned by board to specified > > cpu_index. > > > > Signed-off-by: Igor Mammedov <imammedo@redhat.com> > > Reviewed-by: Eduardo Habkost <ehabkost@redhat.com> > > Just two extra comments below: > > [...] > > +static CpuInstanceProperties > > +virt_cpu_index_to_props(MachineState *ms, unsigned cpu_index) > > +{ > > + MachineClass *mc = MACHINE_GET_CLASS(ms); > > + const CPUArchIdList *possible_cpus = mc->possible_cpu_arch_ids(ms); > > + > > + assert(cpu_index < possible_cpus->len); > > + return possible_cpus->cpus[cpu_index].props;; > > +} > > + > [...] > > +static CpuInstanceProperties > > +pc_cpu_index_to_props(MachineState *ms, unsigned cpu_index) > > { > > + MachineClass *mc = MACHINE_GET_CLASS(ms); > > + const CPUArchIdList *possible_cpus = mc->possible_cpu_arch_ids(ms); > > + > > + assert(cpu_index < possible_cpus->len); > > + return possible_cpus->cpus[cpu_index].props;; > > } > > The fact that these two implementations look exactly the same > made me wonder: > > 1) Why this isn't the default implementation; > 2) Why exactly spapr needs a different implementation. > > Then I noticed that there's nothing in the common machine code > that specifies that possible_cpus->cpus[] is indexed by > cpu_index. This means it is indeed safer to require each machine > to provide its own cpu_index_to_props implementation than having > a default implementation that can unexpectedly break (e.g. if > granularity at possible_cpus is not at VCPU/thread level). > > I would still like to have an abstraction that wouldn't require > writing machine-specific code (e.g. cpu_index ranges to > possible_cpus like David suggested), but that's for a follow-up > series. Yeah, that similarity bothered me to, but like you I realised the problem is that spapr simply doesn't have the same granularity of information as x86 and ARM - there's only one entry per core for PAPR instead of one per thread. So, we do need a machine specific mapping of cpu_index to location properties, which is what the callback is for. It does occur to me that another way of accomplishing that would be for possible_cpu_arch_ids() to create a cpu_index->props mapping as a simple array ofpointers, in addition to the list of possiblee props structures. Not sure if that would end up looking better or not. > [...] > > +static CpuInstanceProperties > > +spapr_cpu_index_to_props(MachineState *machine, unsigned cpu_index) > > { > > + CPUArchId *core_slot; > > + MachineClass *mc = MACHINE_GET_CLASS(machine); > > + > > + /* make sure possible_cpu are intialized */ > > + mc->possible_cpu_arch_ids(machine); > > + core_slot = spapr_find_cpu_slot(machine, cpu_index, NULL); > > + assert(core_slot); > > + return core_slot->props; > > } > > If you need to submit v3, maybe a comment here explaining why > spapr needs a different cpu_index_to_props implementation would > be helpful. I took a while to figure it out. >
On Wed, 3 May 2017 12:13:21 -0300 Eduardo Habkost <ehabkost@redhat.com> wrote: > On Wed, May 03, 2017 at 02:56:59PM +0200, Igor Mammedov wrote: > [...] > > diff --git a/numa.c b/numa.c > > index 6fc2393..ab1661d 100644 > > --- a/numa.c > > +++ b/numa.c > > @@ -294,9 +294,10 @@ static void validate_numa_cpus(void) > > g_free(seen_cpus); > > } > > > > -void parse_numa_opts(MachineClass *mc) > > +void parse_numa_opts(MachineState *ms) > > { > > int i; > > + MachineClass *mc = MACHINE_GET_CLASS(ms); > > > > for (i = 0; i < MAX_NODES; i++) { > > numa_info[i].node_cpu = bitmap_new(max_cpus); > > @@ -378,14 +379,16 @@ void parse_numa_opts(MachineClass *mc) > > > Expanding diff context: > > > /* Historically VCPUs were assigned in round-robin order to NUMA > > * nodes. However it causes issues with guest not handling it nice > > * in case where cores/threads from a multicore CPU appear on > > * different nodes. So allow boards to override default distribution > > * rule grouping VCPUs by socket so that VCPUs from the same socket > > * would be on the same node. > > */ > > The above comment looks obsolete, as we are removing the code inside > parse_numa_opts() that deals with grouping VCPUs by socket. > > (Can be fixed by a follow-up patch, if necessary.) it looks like, I'll respin series. So I'll drop it on respin. Looking at setting default mapping more, it should be possible to remove it from parse_numa_opts() altogether after this series. Enable predefined by machine default mapping won't need cpu_index_to_instance_props() translation. We just need to set 'has_node_id = true' in possible_cpus for all cpus. But I'd like to do this cleanup on top of this series. > > > + if (!mc->cpu_index_to_instance_props) { > > + error_report("default CPUs to NUMA node mapping isn't supported"); > > + exit(1); > > + } > > if (i == nb_numa_nodes) { > > for (i = 0; i < max_cpus; i++) { > > - unsigned node_id = i % nb_numa_nodes; > > - if (mc->cpu_index_to_socket_id) { > > - node_id = mc->cpu_index_to_socket_id(i) % nb_numa_nodes; > > - } > > + CpuInstanceProperties props; > > + props = mc->cpu_index_to_instance_props(ms, i); > > > > - set_bit(i, numa_info[node_id].node_cpu); > > + set_bit(i, numa_info[props.node_id].node_cpu); > > } > > } > > > [...] >
On Thu, 4 May 2017 17:32:13 +1000 David Gibson <david@gibson.dropbear.id.au> wrote: > On Wed, May 03, 2017 at 11:42:40AM -0300, Eduardo Habkost wrote: > > On Wed, May 03, 2017 at 02:56:59PM +0200, Igor Mammedov wrote: > > > Originally CPU threads were by default assigned in > > > round-robin fashion. However it was causing issues in > > > guest since CPU threads from the same socket/core could > > > be placed on different NUMA nodes. > > > Commit fb43b73b (pc: fix default VCPU to NUMA node mapping) > > > fixed it by grouping threads within a socket on the same node > > > introducing cpu_index_to_socket_id() callback and commit > > > 20bb648d (spapr: Fix default NUMA node allocation for threads) > > > reused callback to fix similar issues for SPAPR machine > > > even though socket doesn't make much sense there. > > > > > > As result QEMU ended up having 3 default distribution rules > > > used by 3 targets /virt-arm, spapr, pc/. > > > > > > In effort of moving NUMA mapping for CPUs into possible_cpus, > > > generalize default mapping in numa.c by making boards decide > > > on default mapping and let them explicitly tell generic > > > numa code to which node a CPU thread belongs to by replacing > > > cpu_index_to_socket_id() with @cpu_index_to_instance_props() > > > which provides default node_id assigned by board to specified > > > cpu_index. > > > > > > Signed-off-by: Igor Mammedov <imammedo@redhat.com> > > > > Reviewed-by: Eduardo Habkost <ehabkost@redhat.com> > > > > Just two extra comments below: > > > > [...] > > > +static CpuInstanceProperties > > > +virt_cpu_index_to_props(MachineState *ms, unsigned cpu_index) > > > +{ > > > + MachineClass *mc = MACHINE_GET_CLASS(ms); > > > + const CPUArchIdList *possible_cpus = mc->possible_cpu_arch_ids(ms); > > > + > > > + assert(cpu_index < possible_cpus->len); > > > + return possible_cpus->cpus[cpu_index].props;; > > > +} > > > + > > [...] > > > +static CpuInstanceProperties > > > +pc_cpu_index_to_props(MachineState *ms, unsigned cpu_index) > > > { > > > + MachineClass *mc = MACHINE_GET_CLASS(ms); > > > + const CPUArchIdList *possible_cpus = mc->possible_cpu_arch_ids(ms); > > > + > > > + assert(cpu_index < possible_cpus->len); > > > + return possible_cpus->cpus[cpu_index].props;; > > > } > > > > The fact that these two implementations look exactly the same > > made me wonder: > > > > 1) Why this isn't the default implementation; > > 2) Why exactly spapr needs a different implementation. > > > > Then I noticed that there's nothing in the common machine code > > that specifies that possible_cpus->cpus[] is indexed by > > cpu_index. This means it is indeed safer to require each machine > > to provide its own cpu_index_to_props implementation than having > > a default implementation that can unexpectedly break (e.g. if > > granularity at possible_cpus is not at VCPU/thread level). > > > > I would still like to have an abstraction that wouldn't require > > writing machine-specific code (e.g. cpu_index ranges to > > possible_cpus like David suggested), but that's for a follow-up > > series. > > Yeah, that similarity bothered me to, but like you I realised the > problem is that spapr simply doesn't have the same granularity of > information as x86 and ARM - there's only one entry per core for PAPR > instead of one per thread. > > So, we do need a machine specific mapping of cpu_index to location > properties, which is what the callback is for. > > It does occur to me that another way of accomplishing that would be > for possible_cpu_arch_ids() to create a cpu_index->props mapping as a > simple array ofpointers, in addition to the list of possiblee props > structures. > > Not sure if that would end up looking better or not. I'd like to remove foo_cpu_index_to_props() callbacks in future, to do so we would need to obsolete and remove cpu_index based '-numa node,cpus='. Lets give a year for new interface to settle in and then remove old interface. so I'd rather do not extend possible_cpus with more complex structure and related mgmt routines. > > > [...] > > > +static CpuInstanceProperties > > > +spapr_cpu_index_to_props(MachineState *machine, unsigned cpu_index) > > > { > > > + CPUArchId *core_slot; > > > + MachineClass *mc = MACHINE_GET_CLASS(machine); > > > + > > > + /* make sure possible_cpu are intialized */ > > > + mc->possible_cpu_arch_ids(machine); > > > + core_slot = spapr_find_cpu_slot(machine, cpu_index, NULL); > > > + assert(core_slot); > > > + return core_slot->props; > > > } > > > > If you need to submit v3, maybe a comment here explaining why > > spapr needs a different cpu_index_to_props implementation would > > be helpful. I took a while to figure it out. > > >
diff --git a/include/hw/boards.h b/include/hw/boards.h index 31d9c72..5d6af21 100644 --- a/include/hw/boards.h +++ b/include/hw/boards.h @@ -74,7 +74,10 @@ typedef struct { * of HotplugHandler object, which handles hotplug operation * for a given @dev. It may return NULL if @dev doesn't require * any actions to be performed by hotplug handler. - * @cpu_index_to_socket_id: + * @cpu_index_to_instance_props: + * used to provide @cpu_index to socket/core/thread number mapping, allowing + * legacy code to perform maping from cpu_index to topology properties + * Returns: tuple of socket/core/thread ids given cpu_index belongs to. * used to provide @cpu_index to socket number mapping, allowing * a machine to group CPU threads belonging to the same socket/package * Returns: socket number given cpu_index belongs to. @@ -139,7 +142,8 @@ struct MachineClass { HotplugHandler *(*get_hotplug_handler)(MachineState *machine, DeviceState *dev); - unsigned (*cpu_index_to_socket_id)(unsigned cpu_index); + CpuInstanceProperties (*cpu_index_to_instance_props)(MachineState *machine, + unsigned cpu_index); const CPUArchIdList *(*possible_cpu_arch_ids)(MachineState *machine); }; diff --git a/include/sysemu/numa.h b/include/sysemu/numa.h index 8f09dcf..46ea6c7 100644 --- a/include/sysemu/numa.h +++ b/include/sysemu/numa.h @@ -24,7 +24,7 @@ typedef struct node_info { } NodeInfo; extern NodeInfo numa_info[MAX_NODES]; -void parse_numa_opts(MachineClass *mc); +void parse_numa_opts(MachineState *ms); void numa_post_machine_init(void); void query_numa_node_mem(uint64_t node_mem[]); extern QemuOptsList qemu_numa_opts; diff --git a/hw/arm/virt.c b/hw/arm/virt.c index acc748e..3e19b5f 100644 --- a/hw/arm/virt.c +++ b/hw/arm/virt.c @@ -1539,6 +1539,16 @@ static void virt_set_gic_version(Object *obj, const char *value, Error **errp) } } +static CpuInstanceProperties +virt_cpu_index_to_props(MachineState *ms, unsigned cpu_index) +{ + MachineClass *mc = MACHINE_GET_CLASS(ms); + const CPUArchIdList *possible_cpus = mc->possible_cpu_arch_ids(ms); + + assert(cpu_index < possible_cpus->len); + return possible_cpus->cpus[cpu_index].props;; +} + static const CPUArchIdList *virt_possible_cpu_arch_ids(MachineState *ms) { int n; @@ -1558,8 +1568,13 @@ static const CPUArchIdList *virt_possible_cpu_arch_ids(MachineState *ms) ms->possible_cpus->cpus[n].props.has_thread_id = true; ms->possible_cpus->cpus[n].props.thread_id = n; - /* TODO: add 'has_node/node' here to describe - to which node core belongs */ + /* default distribution of CPUs over NUMA nodes */ + if (nb_numa_nodes) { + /* preset values but do not enable them i.e. 'has_node_id = false', + * numa init code will enable them later if manual mapping wasn't + * present on CLI */ + ms->possible_cpus->cpus[n].props.node_id = n % nb_numa_nodes; + } } return ms->possible_cpus; } @@ -1581,6 +1596,7 @@ static void virt_machine_class_init(ObjectClass *oc, void *data) /* We know we will never create a pre-ARMv7 CPU which needs 1K pages */ mc->minimum_page_bits = 12; mc->possible_cpu_arch_ids = virt_possible_cpu_arch_ids; + mc->cpu_index_to_instance_props = virt_cpu_index_to_props; } static const TypeInfo virt_machine_info = { diff --git a/hw/i386/pc.c b/hw/i386/pc.c index f3b372a..9cec0c1 100644 --- a/hw/i386/pc.c +++ b/hw/i386/pc.c @@ -2243,12 +2243,14 @@ static void pc_machine_reset(void) } } -static unsigned pc_cpu_index_to_socket_id(unsigned cpu_index) +static CpuInstanceProperties +pc_cpu_index_to_props(MachineState *ms, unsigned cpu_index) { - X86CPUTopoInfo topo; - x86_topo_ids_from_idx(smp_cores, smp_threads, cpu_index, - &topo); - return topo.pkg_id; + MachineClass *mc = MACHINE_GET_CLASS(ms); + const CPUArchIdList *possible_cpus = mc->possible_cpu_arch_ids(ms); + + assert(cpu_index < possible_cpus->len); + return possible_cpus->cpus[cpu_index].props;; } static const CPUArchIdList *pc_possible_cpu_arch_ids(MachineState *ms) @@ -2280,6 +2282,15 @@ static const CPUArchIdList *pc_possible_cpu_arch_ids(MachineState *ms) ms->possible_cpus->cpus[i].props.core_id = topo.core_id; ms->possible_cpus->cpus[i].props.has_thread_id = true; ms->possible_cpus->cpus[i].props.thread_id = topo.smt_id; + + /* default distribution of CPUs over NUMA nodes */ + if (nb_numa_nodes) { + /* preset values but do not enable them i.e. 'has_node_id = false', + * numa init code will enable them later if manual mapping wasn't + * present on CLI */ + ms->possible_cpus->cpus[i].props.node_id = + topo.pkg_id % nb_numa_nodes; + } } return ms->possible_cpus; } @@ -2322,7 +2333,7 @@ static void pc_machine_class_init(ObjectClass *oc, void *data) pcmc->acpi_data_size = 0x20000 + 0x8000; pcmc->save_tsc_khz = true; mc->get_hotplug_handler = pc_get_hotpug_handler; - mc->cpu_index_to_socket_id = pc_cpu_index_to_socket_id; + mc->cpu_index_to_instance_props = pc_cpu_index_to_props; mc->possible_cpu_arch_ids = pc_possible_cpu_arch_ids; mc->has_hotpluggable_cpus = true; mc->default_boot_order = "cad"; diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c index 80d12d0..33405a0 100644 --- a/hw/ppc/spapr.c +++ b/hw/ppc/spapr.c @@ -2981,11 +2981,17 @@ static HotplugHandler *spapr_get_hotplug_handler(MachineState *machine, return NULL; } -static unsigned spapr_cpu_index_to_socket_id(unsigned cpu_index) +static CpuInstanceProperties +spapr_cpu_index_to_props(MachineState *machine, unsigned cpu_index) { - /* Allocate to NUMA nodes on a "socket" basis (not that concept of - * socket means much for the paravirtualized PAPR platform) */ - return cpu_index / smp_threads / smp_cores; + CPUArchId *core_slot; + MachineClass *mc = MACHINE_GET_CLASS(machine); + + /* make sure possible_cpu are intialized */ + mc->possible_cpu_arch_ids(machine); + core_slot = spapr_find_cpu_slot(machine, cpu_index, NULL); + assert(core_slot); + return core_slot->props; } static const CPUArchIdList *spapr_possible_cpu_arch_ids(MachineState *machine) @@ -3012,8 +3018,15 @@ static const CPUArchIdList *spapr_possible_cpu_arch_ids(MachineState *machine) machine->possible_cpus->cpus[i].arch_id = core_id; machine->possible_cpus->cpus[i].props.has_core_id = true; machine->possible_cpus->cpus[i].props.core_id = core_id; - /* TODO: add 'has_node/node' here to describe - to which node core belongs */ + + /* default distribution of CPUs over NUMA nodes */ + if (nb_numa_nodes) { + /* preset values but do not enable them i.e. 'has_node_id = false', + * numa init code will enable them later if manual mapping wasn't + * present on CLI */ + machine->possible_cpus->cpus[i].props.node_id = + core_id / smp_threads / smp_cores % nb_numa_nodes; + } } return machine->possible_cpus; } @@ -3138,7 +3151,7 @@ static void spapr_machine_class_init(ObjectClass *oc, void *data) hc->pre_plug = spapr_machine_device_pre_plug; hc->plug = spapr_machine_device_plug; hc->unplug = spapr_machine_device_unplug; - mc->cpu_index_to_socket_id = spapr_cpu_index_to_socket_id; + mc->cpu_index_to_instance_props = spapr_cpu_index_to_props; mc->possible_cpu_arch_ids = spapr_possible_cpu_arch_ids; hc->unplug_request = spapr_machine_device_unplug_request; diff --git a/numa.c b/numa.c index 6fc2393..ab1661d 100644 --- a/numa.c +++ b/numa.c @@ -294,9 +294,10 @@ static void validate_numa_cpus(void) g_free(seen_cpus); } -void parse_numa_opts(MachineClass *mc) +void parse_numa_opts(MachineState *ms) { int i; + MachineClass *mc = MACHINE_GET_CLASS(ms); for (i = 0; i < MAX_NODES; i++) { numa_info[i].node_cpu = bitmap_new(max_cpus); @@ -378,14 +379,16 @@ void parse_numa_opts(MachineClass *mc) * rule grouping VCPUs by socket so that VCPUs from the same socket * would be on the same node. */ + if (!mc->cpu_index_to_instance_props) { + error_report("default CPUs to NUMA node mapping isn't supported"); + exit(1); + } if (i == nb_numa_nodes) { for (i = 0; i < max_cpus; i++) { - unsigned node_id = i % nb_numa_nodes; - if (mc->cpu_index_to_socket_id) { - node_id = mc->cpu_index_to_socket_id(i) % nb_numa_nodes; - } + CpuInstanceProperties props; + props = mc->cpu_index_to_instance_props(ms, i); - set_bit(i, numa_info[node_id].node_cpu); + set_bit(i, numa_info[props.node_id].node_cpu); } } diff --git a/vl.c b/vl.c index f46e070..c63f4d5 100644 --- a/vl.c +++ b/vl.c @@ -4506,7 +4506,7 @@ int main(int argc, char **argv, char **envp) default_drive(default_floppy, snapshot, IF_FLOPPY, 0, FD_OPTS); default_drive(default_sdcard, snapshot, IF_SD, 0, SD_OPTS); - parse_numa_opts(machine_class); + parse_numa_opts(current_machine); if (qemu_opts_foreach(qemu_find_opts("mon"), mon_init_func, NULL, NULL)) {
Originally CPU threads were by default assigned in round-robin fashion. However it was causing issues in guest since CPU threads from the same socket/core could be placed on different NUMA nodes. Commit fb43b73b (pc: fix default VCPU to NUMA node mapping) fixed it by grouping threads within a socket on the same node introducing cpu_index_to_socket_id() callback and commit 20bb648d (spapr: Fix default NUMA node allocation for threads) reused callback to fix similar issues for SPAPR machine even though socket doesn't make much sense there. As result QEMU ended up having 3 default distribution rules used by 3 targets /virt-arm, spapr, pc/. In effort of moving NUMA mapping for CPUs into possible_cpus, generalize default mapping in numa.c by making boards decide on default mapping and let them explicitly tell generic numa code to which node a CPU thread belongs to by replacing cpu_index_to_socket_id() with @cpu_index_to_instance_props() which provides default node_id assigned by board to specified cpu_index. Signed-off-by: Igor Mammedov <imammedo@redhat.com> --- Patch only moves source of default mapping to possible_cpus[] and leaves the rest of NUMA handling to numa_info[node_id].node_cpu bitmaps. It's up to follow up patches to replace bitmaps with possible_cpus[] internally. v3: - drop duplicate ';' (Drew) v2: - use cpu_index instead of core_id directly in spapr_cpu_index_to_props() (David Gibson) - ammend comment message s/board/numa init code/ (David Gibson) --- include/hw/boards.h | 8 ++++++-- include/sysemu/numa.h | 2 +- hw/arm/virt.c | 20 ++++++++++++++++++-- hw/i386/pc.c | 23 +++++++++++++++++------ hw/ppc/spapr.c | 27 ++++++++++++++++++++------- numa.c | 15 +++++++++------ vl.c | 2 +- 7 files changed, 72 insertions(+), 25 deletions(-)