diff mbox

[v2,05/24] numa: move source of default CPUs to NUMA node mapping into boards

Message ID 1493816238-33120-6-git-send-email-imammedo@redhat.com
State New
Headers show

Commit Message

Igor Mammedov May 3, 2017, 12:56 p.m. UTC
Originally CPU threads were by default assigned in
round-robin fashion. However it was causing issues in
guest since CPU threads from the same socket/core could
be placed on different NUMA nodes.
Commit fb43b73b (pc: fix default VCPU to NUMA node mapping)
fixed it by grouping threads within a socket on the same node
introducing cpu_index_to_socket_id() callback and commit
20bb648d (spapr: Fix default NUMA node allocation for threads)
reused callback to fix similar issues for SPAPR machine
even though socket doesn't make much sense there.

As result QEMU ended up having 3 default distribution rules
used by 3 targets /virt-arm, spapr, pc/.

In effort of moving NUMA mapping for CPUs into possible_cpus,
generalize default mapping in numa.c by making boards decide
on default mapping and let them explicitly tell generic
numa code to which node a CPU thread belongs to by replacing
cpu_index_to_socket_id() with @cpu_index_to_instance_props()
which provides default node_id assigned by board to specified
cpu_index.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
---
Patch only moves source of default mapping to possible_cpus[]
and leaves the rest of NUMA handling to numa_info[node_id].node_cpu
bitmaps. It's up to follow up patches to replace bitmaps
with possible_cpus[] internally.

v3:
  - drop duplicate ';' (Drew)
v2:
  - use cpu_index instead of core_id directly in spapr_cpu_index_to_props()
       (David Gibson)
  - ammend comment message s/board/numa init code/
       (David Gibson)
---
 include/hw/boards.h   |  8 ++++++--
 include/sysemu/numa.h |  2 +-
 hw/arm/virt.c         | 20 ++++++++++++++++++--
 hw/i386/pc.c          | 23 +++++++++++++++++------
 hw/ppc/spapr.c        | 27 ++++++++++++++++++++-------
 numa.c                | 15 +++++++++------
 vl.c                  |  2 +-
 7 files changed, 72 insertions(+), 25 deletions(-)

Comments

Eduardo Habkost May 3, 2017, 2:42 p.m. UTC | #1
On Wed, May 03, 2017 at 02:56:59PM +0200, Igor Mammedov wrote:
> Originally CPU threads were by default assigned in
> round-robin fashion. However it was causing issues in
> guest since CPU threads from the same socket/core could
> be placed on different NUMA nodes.
> Commit fb43b73b (pc: fix default VCPU to NUMA node mapping)
> fixed it by grouping threads within a socket on the same node
> introducing cpu_index_to_socket_id() callback and commit
> 20bb648d (spapr: Fix default NUMA node allocation for threads)
> reused callback to fix similar issues for SPAPR machine
> even though socket doesn't make much sense there.
> 
> As result QEMU ended up having 3 default distribution rules
> used by 3 targets /virt-arm, spapr, pc/.
> 
> In effort of moving NUMA mapping for CPUs into possible_cpus,
> generalize default mapping in numa.c by making boards decide
> on default mapping and let them explicitly tell generic
> numa code to which node a CPU thread belongs to by replacing
> cpu_index_to_socket_id() with @cpu_index_to_instance_props()
> which provides default node_id assigned by board to specified
> cpu_index.
> 
> Signed-off-by: Igor Mammedov <imammedo@redhat.com>

Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>

Just two extra comments below:

[...]
> +static CpuInstanceProperties
> +virt_cpu_index_to_props(MachineState *ms, unsigned cpu_index)
> +{
> +    MachineClass *mc = MACHINE_GET_CLASS(ms);
> +    const CPUArchIdList *possible_cpus = mc->possible_cpu_arch_ids(ms);
> +
> +    assert(cpu_index < possible_cpus->len);
> +    return possible_cpus->cpus[cpu_index].props;;
> +}
> +
[...]
> +static CpuInstanceProperties
> +pc_cpu_index_to_props(MachineState *ms, unsigned cpu_index)
>  {
> +    MachineClass *mc = MACHINE_GET_CLASS(ms);
> +    const CPUArchIdList *possible_cpus = mc->possible_cpu_arch_ids(ms);
> +
> +    assert(cpu_index < possible_cpus->len);
> +    return possible_cpus->cpus[cpu_index].props;;
>  }

The fact that these two implementations look exactly the same
made me wonder:

1) Why this isn't the default implementation;
2) Why exactly spapr needs a different implementation.

Then I noticed that there's nothing in the common machine code
that specifies that possible_cpus->cpus[] is indexed by
cpu_index. This means it is indeed safer to require each machine
to provide its own cpu_index_to_props implementation than having
a default implementation that can unexpectedly break (e.g. if
granularity at possible_cpus is not at VCPU/thread level).

I would still like to have an abstraction that wouldn't require
writing machine-specific code (e.g. cpu_index ranges to
possible_cpus like David suggested), but that's for a follow-up
series.

[...]
> +static CpuInstanceProperties
> +spapr_cpu_index_to_props(MachineState *machine, unsigned cpu_index)
>  {
> +    CPUArchId *core_slot;
> +    MachineClass *mc = MACHINE_GET_CLASS(machine);
> +
> +    /* make sure possible_cpu are intialized */
> +    mc->possible_cpu_arch_ids(machine);
> +    core_slot = spapr_find_cpu_slot(machine, cpu_index, NULL);
> +    assert(core_slot);
> +    return core_slot->props;
>  }

If you need to submit v3, maybe a comment here explaining why
spapr needs a different cpu_index_to_props implementation would
be helpful. I took a while to figure it out.
Eduardo Habkost May 3, 2017, 2:59 p.m. UTC | #2
On Wed, May 03, 2017 at 02:56:59PM +0200, Igor Mammedov wrote:
[...]
> @@ -378,14 +379,16 @@ void parse_numa_opts(MachineClass *mc)
>           * rule grouping VCPUs by socket so that VCPUs from the same socket
>           * would be on the same node.
>           */
> +        if (!mc->cpu_index_to_instance_props) {
> +            error_report("default CPUs to NUMA node mapping isn't supported");
> +            exit(1);
> +        }

This will make people trying to use -numa on unsupported machines
see a misleading error message: instead of telling them that the
machine doesn't support NUMA at all, the message seems to imply
that NUMA may be supported and we just don't have default NUMA node
mapping support.

Probably a more generic "NUMA is not supported by this
machine-type" message before even trying to parse -numa would be
clearer. (I don't know if another patch in this series already
does that.)
Eduardo Habkost May 3, 2017, 3:13 p.m. UTC | #3
On Wed, May 03, 2017 at 02:56:59PM +0200, Igor Mammedov wrote:
[...]
> diff --git a/numa.c b/numa.c
> index 6fc2393..ab1661d 100644
> --- a/numa.c
> +++ b/numa.c
> @@ -294,9 +294,10 @@ static void validate_numa_cpus(void)
>      g_free(seen_cpus);
>  }
>  
> -void parse_numa_opts(MachineClass *mc)
> +void parse_numa_opts(MachineState *ms)
>  {
>      int i;
> +    MachineClass *mc = MACHINE_GET_CLASS(ms);
>  
>      for (i = 0; i < MAX_NODES; i++) {
>          numa_info[i].node_cpu = bitmap_new(max_cpus);
> @@ -378,14 +379,16 @@ void parse_numa_opts(MachineClass *mc)


Expanding diff context:

>          /* Historically VCPUs were assigned in round-robin order to NUMA
>           * nodes. However it causes issues with guest not handling it nice
>           * in case where cores/threads from a multicore CPU appear on
>           * different nodes. So allow boards to override default distribution
>           * rule grouping VCPUs by socket so that VCPUs from the same socket
>           * would be on the same node.
>           */

The above comment looks obsolete, as we are removing the code inside
parse_numa_opts() that deals with grouping VCPUs by socket.

(Can be fixed by a follow-up patch, if necessary.)

> +        if (!mc->cpu_index_to_instance_props) {
> +            error_report("default CPUs to NUMA node mapping isn't supported");
> +            exit(1);
> +        }
>          if (i == nb_numa_nodes) {
>              for (i = 0; i < max_cpus; i++) {
> -                unsigned node_id = i % nb_numa_nodes;
> -                if (mc->cpu_index_to_socket_id) {
> -                    node_id = mc->cpu_index_to_socket_id(i) % nb_numa_nodes;
> -                }
> +                CpuInstanceProperties props;
> +                props = mc->cpu_index_to_instance_props(ms, i);
>  
> -                set_bit(i, numa_info[node_id].node_cpu);
> +                set_bit(i, numa_info[props.node_id].node_cpu);
>              }
>          }
>  
[...]
Igor Mammedov May 3, 2017, 3:57 p.m. UTC | #4
On Wed, 3 May 2017 11:42:40 -0300
Eduardo Habkost <ehabkost@redhat.com> wrote:

> On Wed, May 03, 2017 at 02:56:59PM +0200, Igor Mammedov wrote:
> > Originally CPU threads were by default assigned in
> > round-robin fashion. However it was causing issues in
> > guest since CPU threads from the same socket/core could
> > be placed on different NUMA nodes.
> > Commit fb43b73b (pc: fix default VCPU to NUMA node mapping)
> > fixed it by grouping threads within a socket on the same node
> > introducing cpu_index_to_socket_id() callback and commit
> > 20bb648d (spapr: Fix default NUMA node allocation for threads)
> > reused callback to fix similar issues for SPAPR machine
> > even though socket doesn't make much sense there.
> > 
> > As result QEMU ended up having 3 default distribution rules
> > used by 3 targets /virt-arm, spapr, pc/.
> > 
> > In effort of moving NUMA mapping for CPUs into possible_cpus,
> > generalize default mapping in numa.c by making boards decide
> > on default mapping and let them explicitly tell generic
> > numa code to which node a CPU thread belongs to by replacing
> > cpu_index_to_socket_id() with @cpu_index_to_instance_props()
> > which provides default node_id assigned by board to specified
> > cpu_index.
> > 
> > Signed-off-by: Igor Mammedov <imammedo@redhat.com>  
> 
> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
> 
> Just two extra comments below:
> 
> [...]
> > +static CpuInstanceProperties
> > +virt_cpu_index_to_props(MachineState *ms, unsigned cpu_index)
> > +{
> > +    MachineClass *mc = MACHINE_GET_CLASS(ms);
> > +    const CPUArchIdList *possible_cpus = mc->possible_cpu_arch_ids(ms);
> > +
> > +    assert(cpu_index < possible_cpus->len);
> > +    return possible_cpus->cpus[cpu_index].props;;
> > +}
> > +  
> [...]
> > +static CpuInstanceProperties
> > +pc_cpu_index_to_props(MachineState *ms, unsigned cpu_index)
> >  {
> > +    MachineClass *mc = MACHINE_GET_CLASS(ms);
> > +    const CPUArchIdList *possible_cpus = mc->possible_cpu_arch_ids(ms);
> > +
> > +    assert(cpu_index < possible_cpus->len);
> > +    return possible_cpus->cpus[cpu_index].props;;
> >  }  
> 
> The fact that these two implementations look exactly the same
> made me wonder:
> 
> 1) Why this isn't the default implementation;
> 2) Why exactly spapr needs a different implementation.
> 
> Then I noticed that there's nothing in the common machine code
> that specifies that possible_cpus->cpus[] is indexed by
> cpu_index. This means it is indeed safer to require each machine
> to provide its own cpu_index_to_props implementation than having
> a default implementation that can unexpectedly break (e.g. if
> granularity at possible_cpus is not at VCPU/thread level).
> 
> I would still like to have an abstraction that wouldn't require
> writing machine-specific code (e.g. cpu_index ranges to
> possible_cpus like David suggested), but that's for a follow-up
> series.
I've left generalizing this part until we have defined/stable
topology for aarch64.
Also I hope, that instead of generalizing we might
depricate/drop -numa node,cpus=0,1,... in future and just drop
this hooks altogether as they are there only for compatibility
with old cpu_index based mapping


> [...]
> > +static CpuInstanceProperties
> > +spapr_cpu_index_to_props(MachineState *machine, unsigned cpu_index)
> >  {
> > +    CPUArchId *core_slot;
> > +    MachineClass *mc = MACHINE_GET_CLASS(machine);
> > +
> > +    /* make sure possible_cpu are intialized */
> > +    mc->possible_cpu_arch_ids(machine);
> > +    core_slot = spapr_find_cpu_slot(machine, cpu_index, NULL);
> > +    assert(core_slot);
> > +    return core_slot->props;
> >  }  
> 
> If you need to submit v3, maybe a comment here explaining why
> spapr needs a different cpu_index_to_props implementation would
> be helpful. I took a while to figure it out.
I'll add comment if series is respinned.
Igor Mammedov May 3, 2017, 4:14 p.m. UTC | #5
On Wed, 3 May 2017 11:59:35 -0300
Eduardo Habkost <ehabkost@redhat.com> wrote:

> On Wed, May 03, 2017 at 02:56:59PM +0200, Igor Mammedov wrote:
> [...]
> > @@ -378,14 +379,16 @@ void parse_numa_opts(MachineClass *mc)
> >           * rule grouping VCPUs by socket so that VCPUs from the same socket
> >           * would be on the same node.
> >           */
> > +        if (!mc->cpu_index_to_instance_props) {
> > +            error_report("default CPUs to NUMA node mapping isn't supported");
> > +            exit(1);
> > +        }  
> 
> This will make people trying to use -numa on unsupported machines
> see a misleading error message: instead of telling them that the
> machine doesn't support NUMA at all, the message seems to imply
> that NUMA may be supported and we just don't have default NUMA node
> mapping support.
> 
> Probably a more generic "NUMA is not supported by this
> machine-type" message before even trying to parse -numa would be
> clearer. (I don't know if another patch in this series already
> does that.)
no, other places should error out with other error messages if it's
not supported.
David Gibson May 4, 2017, 7:32 a.m. UTC | #6
On Wed, May 03, 2017 at 11:42:40AM -0300, Eduardo Habkost wrote:
> On Wed, May 03, 2017 at 02:56:59PM +0200, Igor Mammedov wrote:
> > Originally CPU threads were by default assigned in
> > round-robin fashion. However it was causing issues in
> > guest since CPU threads from the same socket/core could
> > be placed on different NUMA nodes.
> > Commit fb43b73b (pc: fix default VCPU to NUMA node mapping)
> > fixed it by grouping threads within a socket on the same node
> > introducing cpu_index_to_socket_id() callback and commit
> > 20bb648d (spapr: Fix default NUMA node allocation for threads)
> > reused callback to fix similar issues for SPAPR machine
> > even though socket doesn't make much sense there.
> > 
> > As result QEMU ended up having 3 default distribution rules
> > used by 3 targets /virt-arm, spapr, pc/.
> > 
> > In effort of moving NUMA mapping for CPUs into possible_cpus,
> > generalize default mapping in numa.c by making boards decide
> > on default mapping and let them explicitly tell generic
> > numa code to which node a CPU thread belongs to by replacing
> > cpu_index_to_socket_id() with @cpu_index_to_instance_props()
> > which provides default node_id assigned by board to specified
> > cpu_index.
> > 
> > Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> 
> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
> 
> Just two extra comments below:
> 
> [...]
> > +static CpuInstanceProperties
> > +virt_cpu_index_to_props(MachineState *ms, unsigned cpu_index)
> > +{
> > +    MachineClass *mc = MACHINE_GET_CLASS(ms);
> > +    const CPUArchIdList *possible_cpus = mc->possible_cpu_arch_ids(ms);
> > +
> > +    assert(cpu_index < possible_cpus->len);
> > +    return possible_cpus->cpus[cpu_index].props;;
> > +}
> > +
> [...]
> > +static CpuInstanceProperties
> > +pc_cpu_index_to_props(MachineState *ms, unsigned cpu_index)
> >  {
> > +    MachineClass *mc = MACHINE_GET_CLASS(ms);
> > +    const CPUArchIdList *possible_cpus = mc->possible_cpu_arch_ids(ms);
> > +
> > +    assert(cpu_index < possible_cpus->len);
> > +    return possible_cpus->cpus[cpu_index].props;;
> >  }
> 
> The fact that these two implementations look exactly the same
> made me wonder:
> 
> 1) Why this isn't the default implementation;
> 2) Why exactly spapr needs a different implementation.
> 
> Then I noticed that there's nothing in the common machine code
> that specifies that possible_cpus->cpus[] is indexed by
> cpu_index. This means it is indeed safer to require each machine
> to provide its own cpu_index_to_props implementation than having
> a default implementation that can unexpectedly break (e.g. if
> granularity at possible_cpus is not at VCPU/thread level).
> 
> I would still like to have an abstraction that wouldn't require
> writing machine-specific code (e.g. cpu_index ranges to
> possible_cpus like David suggested), but that's for a follow-up
> series.

Yeah, that similarity bothered me to, but like you I realised the
problem is that spapr simply doesn't have the same granularity of
information as x86 and ARM - there's only one entry per core for PAPR
instead of one per thread.

So, we do need a machine specific mapping of cpu_index to location
properties, which is what the callback is for.

It does occur to me that another way of accomplishing that would be
for possible_cpu_arch_ids() to create a cpu_index->props mapping as a
simple array ofpointers, in addition to the list of possiblee props
structures.

Not sure if that would end up looking better or not.

> [...]
> > +static CpuInstanceProperties
> > +spapr_cpu_index_to_props(MachineState *machine, unsigned cpu_index)
> >  {
> > +    CPUArchId *core_slot;
> > +    MachineClass *mc = MACHINE_GET_CLASS(machine);
> > +
> > +    /* make sure possible_cpu are intialized */
> > +    mc->possible_cpu_arch_ids(machine);
> > +    core_slot = spapr_find_cpu_slot(machine, cpu_index, NULL);
> > +    assert(core_slot);
> > +    return core_slot->props;
> >  }
> 
> If you need to submit v3, maybe a comment here explaining why
> spapr needs a different cpu_index_to_props implementation would
> be helpful. I took a while to figure it out.
>
Igor Mammedov May 4, 2017, 9:19 a.m. UTC | #7
On Wed, 3 May 2017 12:13:21 -0300
Eduardo Habkost <ehabkost@redhat.com> wrote:

> On Wed, May 03, 2017 at 02:56:59PM +0200, Igor Mammedov wrote:
> [...]
> > diff --git a/numa.c b/numa.c
> > index 6fc2393..ab1661d 100644
> > --- a/numa.c
> > +++ b/numa.c
> > @@ -294,9 +294,10 @@ static void validate_numa_cpus(void)
> >      g_free(seen_cpus);
> >  }
> >  
> > -void parse_numa_opts(MachineClass *mc)
> > +void parse_numa_opts(MachineState *ms)
> >  {
> >      int i;
> > +    MachineClass *mc = MACHINE_GET_CLASS(ms);
> >  
> >      for (i = 0; i < MAX_NODES; i++) {
> >          numa_info[i].node_cpu = bitmap_new(max_cpus);
> > @@ -378,14 +379,16 @@ void parse_numa_opts(MachineClass *mc)  
> 
> 
> Expanding diff context:
> 
> >          /* Historically VCPUs were assigned in round-robin order to NUMA
> >           * nodes. However it causes issues with guest not handling it nice
> >           * in case where cores/threads from a multicore CPU appear on
> >           * different nodes. So allow boards to override default distribution
> >           * rule grouping VCPUs by socket so that VCPUs from the same socket
> >           * would be on the same node.
> >           */  
> 
> The above comment looks obsolete, as we are removing the code inside
> parse_numa_opts() that deals with grouping VCPUs by socket.
> 
> (Can be fixed by a follow-up patch, if necessary.)
it looks like, I'll respin series.
So I'll drop it on respin.

Looking at setting default mapping more,
it should be possible to remove it from parse_numa_opts()
altogether after this series.
Enable predefined by machine default mapping won't need
cpu_index_to_instance_props() translation. We just need
to set 'has_node_id = true' in possible_cpus for all cpus.

But I'd like to do this cleanup on top of this series.

> 
> > +        if (!mc->cpu_index_to_instance_props) {
> > +            error_report("default CPUs to NUMA node mapping isn't supported");
> > +            exit(1);
> > +        }
> >          if (i == nb_numa_nodes) {
> >              for (i = 0; i < max_cpus; i++) {
> > -                unsigned node_id = i % nb_numa_nodes;
> > -                if (mc->cpu_index_to_socket_id) {
> > -                    node_id = mc->cpu_index_to_socket_id(i) % nb_numa_nodes;
> > -                }
> > +                CpuInstanceProperties props;
> > +                props = mc->cpu_index_to_instance_props(ms, i);
> >  
> > -                set_bit(i, numa_info[node_id].node_cpu);
> > +                set_bit(i, numa_info[props.node_id].node_cpu);
> >              }
> >          }
> >    
> [...]
>
Igor Mammedov May 5, 2017, 8:01 a.m. UTC | #8
On Thu, 4 May 2017 17:32:13 +1000
David Gibson <david@gibson.dropbear.id.au> wrote:

> On Wed, May 03, 2017 at 11:42:40AM -0300, Eduardo Habkost wrote:
> > On Wed, May 03, 2017 at 02:56:59PM +0200, Igor Mammedov wrote:  
> > > Originally CPU threads were by default assigned in
> > > round-robin fashion. However it was causing issues in
> > > guest since CPU threads from the same socket/core could
> > > be placed on different NUMA nodes.
> > > Commit fb43b73b (pc: fix default VCPU to NUMA node mapping)
> > > fixed it by grouping threads within a socket on the same node
> > > introducing cpu_index_to_socket_id() callback and commit
> > > 20bb648d (spapr: Fix default NUMA node allocation for threads)
> > > reused callback to fix similar issues for SPAPR machine
> > > even though socket doesn't make much sense there.
> > > 
> > > As result QEMU ended up having 3 default distribution rules
> > > used by 3 targets /virt-arm, spapr, pc/.
> > > 
> > > In effort of moving NUMA mapping for CPUs into possible_cpus,
> > > generalize default mapping in numa.c by making boards decide
> > > on default mapping and let them explicitly tell generic
> > > numa code to which node a CPU thread belongs to by replacing
> > > cpu_index_to_socket_id() with @cpu_index_to_instance_props()
> > > which provides default node_id assigned by board to specified
> > > cpu_index.
> > > 
> > > Signed-off-by: Igor Mammedov <imammedo@redhat.com>  
> > 
> > Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
> > 
> > Just two extra comments below:
> > 
> > [...]  
> > > +static CpuInstanceProperties
> > > +virt_cpu_index_to_props(MachineState *ms, unsigned cpu_index)
> > > +{
> > > +    MachineClass *mc = MACHINE_GET_CLASS(ms);
> > > +    const CPUArchIdList *possible_cpus = mc->possible_cpu_arch_ids(ms);
> > > +
> > > +    assert(cpu_index < possible_cpus->len);
> > > +    return possible_cpus->cpus[cpu_index].props;;
> > > +}
> > > +  
> > [...]  
> > > +static CpuInstanceProperties
> > > +pc_cpu_index_to_props(MachineState *ms, unsigned cpu_index)
> > >  {
> > > +    MachineClass *mc = MACHINE_GET_CLASS(ms);
> > > +    const CPUArchIdList *possible_cpus = mc->possible_cpu_arch_ids(ms);
> > > +
> > > +    assert(cpu_index < possible_cpus->len);
> > > +    return possible_cpus->cpus[cpu_index].props;;
> > >  }  
> > 
> > The fact that these two implementations look exactly the same
> > made me wonder:
> > 
> > 1) Why this isn't the default implementation;
> > 2) Why exactly spapr needs a different implementation.
> > 
> > Then I noticed that there's nothing in the common machine code
> > that specifies that possible_cpus->cpus[] is indexed by
> > cpu_index. This means it is indeed safer to require each machine
> > to provide its own cpu_index_to_props implementation than having
> > a default implementation that can unexpectedly break (e.g. if
> > granularity at possible_cpus is not at VCPU/thread level).
> > 
> > I would still like to have an abstraction that wouldn't require
> > writing machine-specific code (e.g. cpu_index ranges to
> > possible_cpus like David suggested), but that's for a follow-up
> > series.  
> 
> Yeah, that similarity bothered me to, but like you I realised the
> problem is that spapr simply doesn't have the same granularity of
> information as x86 and ARM - there's only one entry per core for PAPR
> instead of one per thread.
> 
> So, we do need a machine specific mapping of cpu_index to location
> properties, which is what the callback is for.
> 
> It does occur to me that another way of accomplishing that would be
> for possible_cpu_arch_ids() to create a cpu_index->props mapping as a
> simple array ofpointers, in addition to the list of possiblee props
> structures.
> 
> Not sure if that would end up looking better or not.
I'd like to remove foo_cpu_index_to_props() callbacks in future,
to do so we would need to obsolete and remove cpu_index based
'-numa node,cpus='. Lets give a year for new interface to settle
in and then remove old interface.

so I'd rather do not extend possible_cpus with more complex structure
and related mgmt routines.



> 
> > [...]  
> > > +static CpuInstanceProperties
> > > +spapr_cpu_index_to_props(MachineState *machine, unsigned cpu_index)
> > >  {
> > > +    CPUArchId *core_slot;
> > > +    MachineClass *mc = MACHINE_GET_CLASS(machine);
> > > +
> > > +    /* make sure possible_cpu are intialized */
> > > +    mc->possible_cpu_arch_ids(machine);
> > > +    core_slot = spapr_find_cpu_slot(machine, cpu_index, NULL);
> > > +    assert(core_slot);
> > > +    return core_slot->props;
> > >  }  
> > 
> > If you need to submit v3, maybe a comment here explaining why
> > spapr needs a different cpu_index_to_props implementation would
> > be helpful. I took a while to figure it out.
> >   
>
diff mbox

Patch

diff --git a/include/hw/boards.h b/include/hw/boards.h
index 31d9c72..5d6af21 100644
--- a/include/hw/boards.h
+++ b/include/hw/boards.h
@@ -74,7 +74,10 @@  typedef struct {
  *    of HotplugHandler object, which handles hotplug operation
  *    for a given @dev. It may return NULL if @dev doesn't require
  *    any actions to be performed by hotplug handler.
- * @cpu_index_to_socket_id:
+ * @cpu_index_to_instance_props:
+ *    used to provide @cpu_index to socket/core/thread number mapping, allowing
+ *    legacy code to perform maping from cpu_index to topology properties
+ *    Returns: tuple of socket/core/thread ids given cpu_index belongs to.
  *    used to provide @cpu_index to socket number mapping, allowing
  *    a machine to group CPU threads belonging to the same socket/package
  *    Returns: socket number given cpu_index belongs to.
@@ -139,7 +142,8 @@  struct MachineClass {
 
     HotplugHandler *(*get_hotplug_handler)(MachineState *machine,
                                            DeviceState *dev);
-    unsigned (*cpu_index_to_socket_id)(unsigned cpu_index);
+    CpuInstanceProperties (*cpu_index_to_instance_props)(MachineState *machine,
+                                                         unsigned cpu_index);
     const CPUArchIdList *(*possible_cpu_arch_ids)(MachineState *machine);
 };
 
diff --git a/include/sysemu/numa.h b/include/sysemu/numa.h
index 8f09dcf..46ea6c7 100644
--- a/include/sysemu/numa.h
+++ b/include/sysemu/numa.h
@@ -24,7 +24,7 @@  typedef struct node_info {
 } NodeInfo;
 
 extern NodeInfo numa_info[MAX_NODES];
-void parse_numa_opts(MachineClass *mc);
+void parse_numa_opts(MachineState *ms);
 void numa_post_machine_init(void);
 void query_numa_node_mem(uint64_t node_mem[]);
 extern QemuOptsList qemu_numa_opts;
diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index acc748e..3e19b5f 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -1539,6 +1539,16 @@  static void virt_set_gic_version(Object *obj, const char *value, Error **errp)
     }
 }
 
+static CpuInstanceProperties
+virt_cpu_index_to_props(MachineState *ms, unsigned cpu_index)
+{
+    MachineClass *mc = MACHINE_GET_CLASS(ms);
+    const CPUArchIdList *possible_cpus = mc->possible_cpu_arch_ids(ms);
+
+    assert(cpu_index < possible_cpus->len);
+    return possible_cpus->cpus[cpu_index].props;;
+}
+
 static const CPUArchIdList *virt_possible_cpu_arch_ids(MachineState *ms)
 {
     int n;
@@ -1558,8 +1568,13 @@  static const CPUArchIdList *virt_possible_cpu_arch_ids(MachineState *ms)
         ms->possible_cpus->cpus[n].props.has_thread_id = true;
         ms->possible_cpus->cpus[n].props.thread_id = n;
 
-        /* TODO: add 'has_node/node' here to describe
-           to which node core belongs */
+        /* default distribution of CPUs over NUMA nodes */
+        if (nb_numa_nodes) {
+            /* preset values but do not enable them i.e. 'has_node_id = false',
+             * numa init code will enable them later if manual mapping wasn't
+             * present on CLI */
+            ms->possible_cpus->cpus[n].props.node_id = n % nb_numa_nodes;
+        }
     }
     return ms->possible_cpus;
 }
@@ -1581,6 +1596,7 @@  static void virt_machine_class_init(ObjectClass *oc, void *data)
     /* We know we will never create a pre-ARMv7 CPU which needs 1K pages */
     mc->minimum_page_bits = 12;
     mc->possible_cpu_arch_ids = virt_possible_cpu_arch_ids;
+    mc->cpu_index_to_instance_props = virt_cpu_index_to_props;
 }
 
 static const TypeInfo virt_machine_info = {
diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index f3b372a..9cec0c1 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -2243,12 +2243,14 @@  static void pc_machine_reset(void)
     }
 }
 
-static unsigned pc_cpu_index_to_socket_id(unsigned cpu_index)
+static CpuInstanceProperties
+pc_cpu_index_to_props(MachineState *ms, unsigned cpu_index)
 {
-    X86CPUTopoInfo topo;
-    x86_topo_ids_from_idx(smp_cores, smp_threads, cpu_index,
-                          &topo);
-    return topo.pkg_id;
+    MachineClass *mc = MACHINE_GET_CLASS(ms);
+    const CPUArchIdList *possible_cpus = mc->possible_cpu_arch_ids(ms);
+
+    assert(cpu_index < possible_cpus->len);
+    return possible_cpus->cpus[cpu_index].props;;
 }
 
 static const CPUArchIdList *pc_possible_cpu_arch_ids(MachineState *ms)
@@ -2280,6 +2282,15 @@  static const CPUArchIdList *pc_possible_cpu_arch_ids(MachineState *ms)
         ms->possible_cpus->cpus[i].props.core_id = topo.core_id;
         ms->possible_cpus->cpus[i].props.has_thread_id = true;
         ms->possible_cpus->cpus[i].props.thread_id = topo.smt_id;
+
+        /* default distribution of CPUs over NUMA nodes */
+        if (nb_numa_nodes) {
+            /* preset values but do not enable them i.e. 'has_node_id = false',
+             * numa init code will enable them later if manual mapping wasn't
+             * present on CLI */
+            ms->possible_cpus->cpus[i].props.node_id =
+                topo.pkg_id % nb_numa_nodes;
+        }
     }
     return ms->possible_cpus;
 }
@@ -2322,7 +2333,7 @@  static void pc_machine_class_init(ObjectClass *oc, void *data)
     pcmc->acpi_data_size = 0x20000 + 0x8000;
     pcmc->save_tsc_khz = true;
     mc->get_hotplug_handler = pc_get_hotpug_handler;
-    mc->cpu_index_to_socket_id = pc_cpu_index_to_socket_id;
+    mc->cpu_index_to_instance_props = pc_cpu_index_to_props;
     mc->possible_cpu_arch_ids = pc_possible_cpu_arch_ids;
     mc->has_hotpluggable_cpus = true;
     mc->default_boot_order = "cad";
diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 80d12d0..33405a0 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -2981,11 +2981,17 @@  static HotplugHandler *spapr_get_hotplug_handler(MachineState *machine,
     return NULL;
 }
 
-static unsigned spapr_cpu_index_to_socket_id(unsigned cpu_index)
+static CpuInstanceProperties
+spapr_cpu_index_to_props(MachineState *machine, unsigned cpu_index)
 {
-    /* Allocate to NUMA nodes on a "socket" basis (not that concept of
-     * socket means much for the paravirtualized PAPR platform) */
-    return cpu_index / smp_threads / smp_cores;
+    CPUArchId *core_slot;
+    MachineClass *mc = MACHINE_GET_CLASS(machine);
+
+    /* make sure possible_cpu are intialized */
+    mc->possible_cpu_arch_ids(machine);
+    core_slot = spapr_find_cpu_slot(machine, cpu_index, NULL);
+    assert(core_slot);
+    return core_slot->props;
 }
 
 static const CPUArchIdList *spapr_possible_cpu_arch_ids(MachineState *machine)
@@ -3012,8 +3018,15 @@  static const CPUArchIdList *spapr_possible_cpu_arch_ids(MachineState *machine)
         machine->possible_cpus->cpus[i].arch_id = core_id;
         machine->possible_cpus->cpus[i].props.has_core_id = true;
         machine->possible_cpus->cpus[i].props.core_id = core_id;
-        /* TODO: add 'has_node/node' here to describe
-           to which node core belongs */
+
+        /* default distribution of CPUs over NUMA nodes */
+        if (nb_numa_nodes) {
+            /* preset values but do not enable them i.e. 'has_node_id = false',
+             * numa init code will enable them later if manual mapping wasn't
+             * present on CLI */
+            machine->possible_cpus->cpus[i].props.node_id =
+                core_id / smp_threads / smp_cores % nb_numa_nodes;
+        }
     }
     return machine->possible_cpus;
 }
@@ -3138,7 +3151,7 @@  static void spapr_machine_class_init(ObjectClass *oc, void *data)
     hc->pre_plug = spapr_machine_device_pre_plug;
     hc->plug = spapr_machine_device_plug;
     hc->unplug = spapr_machine_device_unplug;
-    mc->cpu_index_to_socket_id = spapr_cpu_index_to_socket_id;
+    mc->cpu_index_to_instance_props = spapr_cpu_index_to_props;
     mc->possible_cpu_arch_ids = spapr_possible_cpu_arch_ids;
     hc->unplug_request = spapr_machine_device_unplug_request;
 
diff --git a/numa.c b/numa.c
index 6fc2393..ab1661d 100644
--- a/numa.c
+++ b/numa.c
@@ -294,9 +294,10 @@  static void validate_numa_cpus(void)
     g_free(seen_cpus);
 }
 
-void parse_numa_opts(MachineClass *mc)
+void parse_numa_opts(MachineState *ms)
 {
     int i;
+    MachineClass *mc = MACHINE_GET_CLASS(ms);
 
     for (i = 0; i < MAX_NODES; i++) {
         numa_info[i].node_cpu = bitmap_new(max_cpus);
@@ -378,14 +379,16 @@  void parse_numa_opts(MachineClass *mc)
          * rule grouping VCPUs by socket so that VCPUs from the same socket
          * would be on the same node.
          */
+        if (!mc->cpu_index_to_instance_props) {
+            error_report("default CPUs to NUMA node mapping isn't supported");
+            exit(1);
+        }
         if (i == nb_numa_nodes) {
             for (i = 0; i < max_cpus; i++) {
-                unsigned node_id = i % nb_numa_nodes;
-                if (mc->cpu_index_to_socket_id) {
-                    node_id = mc->cpu_index_to_socket_id(i) % nb_numa_nodes;
-                }
+                CpuInstanceProperties props;
+                props = mc->cpu_index_to_instance_props(ms, i);
 
-                set_bit(i, numa_info[node_id].node_cpu);
+                set_bit(i, numa_info[props.node_id].node_cpu);
             }
         }
 
diff --git a/vl.c b/vl.c
index f46e070..c63f4d5 100644
--- a/vl.c
+++ b/vl.c
@@ -4506,7 +4506,7 @@  int main(int argc, char **argv, char **envp)
     default_drive(default_floppy, snapshot, IF_FLOPPY, 0, FD_OPTS);
     default_drive(default_sdcard, snapshot, IF_SD, 0, SD_OPTS);
 
-    parse_numa_opts(machine_class);
+    parse_numa_opts(current_machine);
 
     if (qemu_opts_foreach(qemu_find_opts("mon"),
                           mon_init_func, NULL, NULL)) {