diff mbox

[v2,3/5] numa: make sure that all cpus in has has_node_id set if numa is enabled

Message ID 1495550330-34087-4-git-send-email-imammedo@redhat.com
State New
Headers show

Commit Message

Igor Mammedov May 23, 2017, 2:38 p.m. UTC
It fixes/add missing _PXM object for non mapped CPU (x86)
and missing fdt node (virt-arm).

It ensures that possible_cpus contains complete mapping if
numa is enabled by the time machine_init() is executed.

As result non completely mapped CPUs:
 1) appear in ACPI/fdt blobs
 2) QMP query-hotpluggable-cpus command shows bound nodes for such CPUs
 3) allows to drop checks for has_node_id in numa only code,
   reducing number of invariants incomplete mapping could produce
 4) moves fixup/implicit node init from runtime numa_cpu_pre_plug()
   (when CPU object is created) to machine_numa_finish_init() which
   helps to fix [1, 2] and make possible_cpus complete source
   of numa mapping available even before CPUs are created.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
---
 hw/arm/virt-acpi-build.c |  4 +---
 hw/core/machine.c        | 16 ++++++++++------
 hw/i386/acpi-build.c     |  3 +--
 hw/i386/pc.c             |  4 +---
 numa.c                   |  7 +------
 5 files changed, 14 insertions(+), 20 deletions(-)

Comments

Eduardo Habkost May 26, 2017, 4:06 p.m. UTC | #1
On Tue, May 23, 2017 at 04:38:48PM +0200, Igor Mammedov wrote:
> It fixes/add missing _PXM object for non mapped CPU (x86)
> and missing fdt node (virt-arm).
> 
> It ensures that possible_cpus contains complete mapping if
> numa is enabled by the time machine_init() is executed.
> 
> As result non completely mapped CPUs:
>  1) appear in ACPI/fdt blobs
>  2) QMP query-hotpluggable-cpus command shows bound nodes for such CPUs
>  3) allows to drop checks for has_node_id in numa only code,
>    reducing number of invariants incomplete mapping could produce
>  4) moves fixup/implicit node init from runtime numa_cpu_pre_plug()
>    (when CPU object is created) to machine_numa_finish_init() which
>    helps to fix [1, 2] and make possible_cpus complete source
>    of numa mapping available even before CPUs are created.
> 
> Signed-off-by: Igor Mammedov <imammedo@redhat.com>

Code that will be affected, but will keep the same behavior:

* pre_plug handler when node-id is omitted on device_add: it
  already sets node_id=0 if !has_node_id
  * However, pre_plug behavior is being changed when node-id is
    set. See below.
* build_srat() (both x86 and arm): already sets node_id = 0 if
  !has_node_id
* bochs_bios_init(): already uses g_new0(), so already had
  omitted entries set to 0.

Affected code that will change behavior with this patch:

* pre_plug handler when node-id is set:
  * See my reply to patch 1/5.  This patch restores the old
    behavior that rejected node-id != 0 on omitted CPUs.
* ACPI _PXM initialization
* arm/virt.c: numa-node-id will now be set for all CPUs
* QMP/HMP query-hotpluggable-cpus output
  * Desirable change, as now the QMP command will reflect what
    the guest is really seeing

Due to the _PXM and FDT changes, I have the same objection I have
for patch 4/5: if we are going to break user expectations when
using incomplete NUMA mappings, let's do that only once (when we
start rejecting incomplete NUMA mappings).

A few extra comments about the has_node_id checks below:

> ---
>  hw/arm/virt-acpi-build.c |  4 +---
>  hw/core/machine.c        | 16 ++++++++++------
>  hw/i386/acpi-build.c     |  3 +--
>  hw/i386/pc.c             |  4 +---
>  numa.c                   |  7 +------
>  5 files changed, 14 insertions(+), 20 deletions(-)
> 
> diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
> index e585206..977a794 100644
> --- a/hw/arm/virt-acpi-build.c
> +++ b/hw/arm/virt-acpi-build.c
> @@ -496,12 +496,10 @@ build_srat(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
>      srat->reserved1 = cpu_to_le32(1);
>  
>      for (i = 0; i < cpu_list->len; ++i) {
> -        int node_id = cpu_list->cpus[i].props.has_node_id ?
> -            cpu_list->cpus[i].props.node_id : 0;
>          core = acpi_data_push(table_data, sizeof(*core));
>          core->type = ACPI_SRAT_PROCESSOR_GICC;
>          core->length = sizeof(*core);
> -        core->proximity = cpu_to_le32(node_id);
> +        core->proximity = cpu_to_le32(cpu_list->cpus[i].props.node_id);
>          core->acpi_processor_uid = cpu_to_le32(i);
>          core->flags = cpu_to_le32(1);
>      }
> diff --git a/hw/core/machine.c b/hw/core/machine.c
> index aaf3cff..964b75d 100644
> --- a/hw/core/machine.c
> +++ b/hw/core/machine.c
> @@ -720,19 +720,23 @@ static void machine_numa_finish_init(MachineState *machine)
>          const CPUArchId *cpu_slot = &possible_cpus->cpus[i];
>  
>          if (!cpu_slot->props.has_node_id) {
> -            if (default_mapping) {
> -                /* fetch default mapping from board and enable it */
> -                CpuInstanceProperties props = cpu_slot->props;
> -                props.has_node_id = true;
> -                machine_set_cpu_numa_node(machine, &props, &error_fatal);
> -            } else {
> +            /* fetch default mapping from board and enable it */
> +            CpuInstanceProperties props = cpu_slot->props;
> +
> +            if (!default_mapping) {
>                  /* record slots with not set mapping,
>                   * TODO: make it hard error in future */
>                  char *cpu_str = cpu_slot_to_string(cpu_slot);
>                  g_string_append_printf(s, "%sCPU %d [%s]",
>                                         s->len ? ", " : "", i, cpu_str);
>                  g_free(cpu_str);
> +
> +                /* non mapped cpus used to fallback to node 0 */
> +                props.node_id = 0;
>              }
> +
> +            props.has_node_id = true;
> +            machine_set_cpu_numa_node(machine, &props, &error_fatal);
>          }
>      }
>      if (s->len) {
> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
> index afcadac..873880d 100644
> --- a/hw/i386/acpi-build.c
> +++ b/hw/i386/acpi-build.c
> @@ -2335,8 +2335,7 @@ build_srat(GArray *table_data, BIOSLinker *linker, MachineState *machine)
>      srat->reserved1 = cpu_to_le32(1);
>  
>      for (i = 0; i < apic_ids->len; i++) {
> -        int node_id = apic_ids->cpus[i].props.has_node_id ?
> -            apic_ids->cpus[i].props.node_id : 0;
> +        int node_id = apic_ids->cpus[i].props.node_id;

What about an assert(props.has_node_id) here?

>          uint32_t apic_id = apic_ids->cpus[i].arch_id;
>  
>          if (apic_id < 255) {
> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> index cf09949..84f0a6f 100644
> --- a/hw/i386/pc.c
> +++ b/hw/i386/pc.c
> @@ -788,9 +788,7 @@ static FWCfgState *bochs_bios_init(AddressSpace *as, PCMachineState *pcms)
>      for (i = 0; i < cpus->len; i++) {
>          unsigned int apic_id = cpus->cpus[i].arch_id;
>          assert(apic_id < pcms->apic_id_limit);
> -        if (cpus->cpus[i].props.has_node_id) {
> -            numa_fw_cfg[apic_id + 1] = cpu_to_le64(cpus->cpus[i].props.node_id);
> -        }
> +        numa_fw_cfg[apic_id + 1] = cpu_to_le64(cpus->cpus[i].props.node_id);

Ditto.

>      }
>      for (i = 0; i < nb_numa_nodes; i++) {
>          numa_fw_cfg[pcms->apic_id_limit + 1 + i] =
> diff --git a/numa.c b/numa.c
> index e30702e..4ef6dea 100644
> --- a/numa.c
> +++ b/numa.c
> @@ -513,17 +513,12 @@ void numa_cpu_pre_plug(const CPUArchId *slot, DeviceState *dev, Error **errp)
>      int node_id = object_property_get_int(OBJECT(dev), "node-id", &error_abort);
>  
>      if (node_id == CPU_UNSET_NUMA_NODE_ID) {
> -        /* by default CPUState::numa_node was 0 if it wasn't set explicitly
> -         * TODO: make it error when incomplete numa mapping support is removed
> -         */
> -        node_id = 0;
> -
>          /* due to bug in libvirt, it doesn't pass node-id from props on
>           * device_add as expected, so we have to fix it up here */
>          if (slot->props.has_node_id) {
>              node_id = slot->props.node_id;
> +            object_property_set_int(OBJECT(dev), node_id, "node-id", errp);
>          }
> -        object_property_set_int(OBJECT(dev), node_id, "node-id", errp);

Why not removing the has_node_id check completely, here?

In case this patch get respun, I suggest also removing the
has_node_id check at: build_cpus_aml() _PXM code.

>      } else if (node_id != slot->props.node_id) {
>          error_setg(errp, "node-id=%d must match numa node specified "
>                     "with -numa option", node_id);
> -- 
> 2.7.4
> 
>
Igor Mammedov May 29, 2017, 11:45 a.m. UTC | #2
On Fri, 26 May 2017 13:06:30 -0300
Eduardo Habkost <ehabkost@redhat.com> wrote:

> On Tue, May 23, 2017 at 04:38:48PM +0200, Igor Mammedov wrote:
> > It fixes/add missing _PXM object for non mapped CPU (x86)
> > and missing fdt node (virt-arm).
> > 
> > It ensures that possible_cpus contains complete mapping if
> > numa is enabled by the time machine_init() is executed.
> > 
> > As result non completely mapped CPUs:
> >  1) appear in ACPI/fdt blobs
> >  2) QMP query-hotpluggable-cpus command shows bound nodes for such CPUs
> >  3) allows to drop checks for has_node_id in numa only code,
> >    reducing number of invariants incomplete mapping could produce
> >  4) moves fixup/implicit node init from runtime numa_cpu_pre_plug()
> >    (when CPU object is created) to machine_numa_finish_init() which
> >    helps to fix [1, 2] and make possible_cpus complete source
> >    of numa mapping available even before CPUs are created.
> > 
> > Signed-off-by: Igor Mammedov <imammedo@redhat.com>  
> 
> Code that will be affected, but will keep the same behavior:
> 
> * pre_plug handler when node-id is omitted on device_add: it
>   already sets node_id=0 if !has_node_id
>   * However, pre_plug behavior is being changed when node-id is
>     set. See below.
> * build_srat() (both x86 and arm): already sets node_id = 0 if
>   !has_node_id
> * bochs_bios_init(): already uses g_new0(), so already had
>   omitted entries set to 0.
> 
> Affected code that will change behavior with this patch:
> 
> * pre_plug handler when node-id is set:
>   * See my reply to patch 1/5.  This patch restores the old
>     behavior that rejected node-id != 0 on omitted CPUs.
I'll look into it.

> * ACPI _PXM initialization
> * arm/virt.c: numa-node-id will now be set for all CPUs
> * QMP/HMP query-hotpluggable-cpus output
>   * Desirable change, as now the QMP command will reflect what
>     the guest is really seeing
> 
> Due to the _PXM and FDT changes, I have the same objection I have
> for patch 4/5: if we are going to break user expectations when
> using incomplete NUMA mappings, let's do that only once (when we
> start rejecting incomplete NUMA mappings).

FDT/PXM change is bugfix to firmware that I insist on.
Firmware should properly describe all CPUs instead of repeating
partial CPU mapping nonsense.

Wrt _PXM change see
(27111931 acpi: provide _PXM method for CPU devices if QEMU is started numa enabled)
provided we keep 0 node fallback by dropping 4/5
linux guest will behave the same as without _PXM.

And again there wasn't any compat code around that
and guest saw node-id change when it's been restarted.

Wrt FDT change, it's the same and I also talked with
Drew about it, his reply was that for ARM target
they don't care much about NUMA as not all pieces are
there to make pinning work on host side yet.

PS:
There wasn't any promise made to keep firmware side
bug compatible or versioned so users should not have
expectations about it in the first place.

Pls don't push for bug compat stuff in the firmware
as it will rise false expectations (at least until
we decide to care about it by default).

(I'd use firmware compat stuff only in cases where
it breaks guest horribly, i.e. crash/fail boot)


> A few extra comments about the has_node_id checks below:
> 
> > ---
> >  hw/arm/virt-acpi-build.c |  4 +---
> >  hw/core/machine.c        | 16 ++++++++++------
> >  hw/i386/acpi-build.c     |  3 +--
> >  hw/i386/pc.c             |  4 +---
> >  numa.c                   |  7 +------
> >  5 files changed, 14 insertions(+), 20 deletions(-)
> > 
> > diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
> > index e585206..977a794 100644
> > --- a/hw/arm/virt-acpi-build.c
> > +++ b/hw/arm/virt-acpi-build.c
> > @@ -496,12 +496,10 @@ build_srat(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
> >      srat->reserved1 = cpu_to_le32(1);
> >  
> >      for (i = 0; i < cpu_list->len; ++i) {
> > -        int node_id = cpu_list->cpus[i].props.has_node_id ?
> > -            cpu_list->cpus[i].props.node_id : 0;
> >          core = acpi_data_push(table_data, sizeof(*core));
> >          core->type = ACPI_SRAT_PROCESSOR_GICC;
> >          core->length = sizeof(*core);
> > -        core->proximity = cpu_to_le32(node_id);
> > +        core->proximity = cpu_to_le32(cpu_list->cpus[i].props.node_id);
> >          core->acpi_processor_uid = cpu_to_le32(i);
> >          core->flags = cpu_to_le32(1);
> >      }
> > diff --git a/hw/core/machine.c b/hw/core/machine.c
> > index aaf3cff..964b75d 100644
> > --- a/hw/core/machine.c
> > +++ b/hw/core/machine.c
> > @@ -720,19 +720,23 @@ static void machine_numa_finish_init(MachineState *machine)
> >          const CPUArchId *cpu_slot = &possible_cpus->cpus[i];
> >  
> >          if (!cpu_slot->props.has_node_id) {
> > -            if (default_mapping) {
> > -                /* fetch default mapping from board and enable it */
> > -                CpuInstanceProperties props = cpu_slot->props;
> > -                props.has_node_id = true;
> > -                machine_set_cpu_numa_node(machine, &props, &error_fatal);
> > -            } else {
> > +            /* fetch default mapping from board and enable it */
> > +            CpuInstanceProperties props = cpu_slot->props;
> > +
> > +            if (!default_mapping) {
> >                  /* record slots with not set mapping,
> >                   * TODO: make it hard error in future */
> >                  char *cpu_str = cpu_slot_to_string(cpu_slot);
> >                  g_string_append_printf(s, "%sCPU %d [%s]",
> >                                         s->len ? ", " : "", i, cpu_str);
> >                  g_free(cpu_str);
> > +
> > +                /* non mapped cpus used to fallback to node 0 */
> > +                props.node_id = 0;
> >              }
> > +
> > +            props.has_node_id = true;
> > +            machine_set_cpu_numa_node(machine, &props, &error_fatal);
> >          }
> >      }
> >      if (s->len) {
> > diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
> > index afcadac..873880d 100644
> > --- a/hw/i386/acpi-build.c
> > +++ b/hw/i386/acpi-build.c
> > @@ -2335,8 +2335,7 @@ build_srat(GArray *table_data, BIOSLinker *linker, MachineState *machine)
> >      srat->reserved1 = cpu_to_le32(1);
> >  
> >      for (i = 0; i < apic_ids->len; i++) {
> > -        int node_id = apic_ids->cpus[i].props.has_node_id ?
> > -            apic_ids->cpus[i].props.node_id : 0;
> > +        int node_id = apic_ids->cpus[i].props.node_id;  
> 
> What about an assert(props.has_node_id) here?
I considered it but dropped idea, as it's mostly useless
since build_srat() is called only when numa is enabled.

and sticking asserts in every place just in case doesn't
seem to me like good idea as it would lead to absurd 
asserts after every line.

Considering the new code behaves the same in case !has_node_id
I don't see any reason to put assert here, the same goes
for 'Ditto" comment in the next hunk.

But if you insist I can' add assert().

> 
> >          uint32_t apic_id = apic_ids->cpus[i].arch_id;
> >  
> >          if (apic_id < 255) {
> > diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> > index cf09949..84f0a6f 100644
> > --- a/hw/i386/pc.c
> > +++ b/hw/i386/pc.c
> > @@ -788,9 +788,7 @@ static FWCfgState *bochs_bios_init(AddressSpace *as, PCMachineState *pcms)
> >      for (i = 0; i < cpus->len; i++) {
> >          unsigned int apic_id = cpus->cpus[i].arch_id;
> >          assert(apic_id < pcms->apic_id_limit);
> > -        if (cpus->cpus[i].props.has_node_id) {
> > -            numa_fw_cfg[apic_id + 1] = cpu_to_le64(cpus->cpus[i].props.node_id);
> > -        }
> > +        numa_fw_cfg[apic_id + 1] = cpu_to_le64(cpus->cpus[i].props.node_id);  
> 
> Ditto.
> 
> >      }
> >      for (i = 0; i < nb_numa_nodes; i++) {
> >          numa_fw_cfg[pcms->apic_id_limit + 1 + i] =
> > diff --git a/numa.c b/numa.c
> > index e30702e..4ef6dea 100644
> > --- a/numa.c
> > +++ b/numa.c
> > @@ -513,17 +513,12 @@ void numa_cpu_pre_plug(const CPUArchId *slot, DeviceState *dev, Error **errp)
> >      int node_id = object_property_get_int(OBJECT(dev), "node-id", &error_abort);
> >  
> >      if (node_id == CPU_UNSET_NUMA_NODE_ID) {
> > -        /* by default CPUState::numa_node was 0 if it wasn't set explicitly
> > -         * TODO: make it error when incomplete numa mapping support is removed
> > -         */
> > -        node_id = 0;
> > -
> >          /* due to bug in libvirt, it doesn't pass node-id from props on
> >           * device_add as expected, so we have to fix it up here */
> >          if (slot->props.has_node_id) {
> >              node_id = slot->props.node_id;
> > +            object_property_set_int(OBJECT(dev), node_id, "node-id", errp);
> >          }
> > -        object_property_set_int(OBJECT(dev), node_id, "node-id", errp);  
> 
> Why not removing the has_node_id check completely, here?
because numa_cpu_pre_plug() is called unconditionally
As alternative, I can replace has_node_id check with assert
and do following from callers:

 if (nb_numa_nodes) {
     numa_cpu_pre_plug()
 }


> In case this patch get respun, I suggest also removing the
> has_node_id check at: build_cpus_aml() _PXM code.
that would be wrong, build_cpus_aml() is called for both
numa and non numa use-cases, there shouldn't be _PXM
object if numa is not enabled, hence the check.

 
> >      } else if (node_id != slot->props.node_id) {
> >          error_setg(errp, "node-id=%d must match numa node specified "
> >                     "with -numa option", node_id);
> > -- 
> > 2.7.4
> > 
> >   
>
Eduardo Habkost May 29, 2017, 1:22 p.m. UTC | #3
On Mon, May 29, 2017 at 01:45:36PM +0200, Igor Mammedov wrote:
> On Fri, 26 May 2017 13:06:30 -0300
> Eduardo Habkost <ehabkost@redhat.com> wrote:
> 
> > On Tue, May 23, 2017 at 04:38:48PM +0200, Igor Mammedov wrote:
> > > It fixes/add missing _PXM object for non mapped CPU (x86)
> > > and missing fdt node (virt-arm).
> > > 
> > > It ensures that possible_cpus contains complete mapping if
> > > numa is enabled by the time machine_init() is executed.
> > > 
> > > As result non completely mapped CPUs:
> > >  1) appear in ACPI/fdt blobs
> > >  2) QMP query-hotpluggable-cpus command shows bound nodes for such CPUs
> > >  3) allows to drop checks for has_node_id in numa only code,
> > >    reducing number of invariants incomplete mapping could produce
> > >  4) moves fixup/implicit node init from runtime numa_cpu_pre_plug()
> > >    (when CPU object is created) to machine_numa_finish_init() which
> > >    helps to fix [1, 2] and make possible_cpus complete source
> > >    of numa mapping available even before CPUs are created.
> > > 
> > > Signed-off-by: Igor Mammedov <imammedo@redhat.com>  
> > 
> > Code that will be affected, but will keep the same behavior:
> > 
> > * pre_plug handler when node-id is omitted on device_add: it
> >   already sets node_id=0 if !has_node_id
> >   * However, pre_plug behavior is being changed when node-id is
> >     set. See below.
> > * build_srat() (both x86 and arm): already sets node_id = 0 if
> >   !has_node_id
> > * bochs_bios_init(): already uses g_new0(), so already had
> >   omitted entries set to 0.
> > 
> > Affected code that will change behavior with this patch:
> > 
> > * pre_plug handler when node-id is set:
> >   * See my reply to patch 1/5.  This patch restores the old
> >     behavior that rejected node-id != 0 on omitted CPUs.
> I'll look into it.
> 
> > * ACPI _PXM initialization
> > * arm/virt.c: numa-node-id will now be set for all CPUs
> > * QMP/HMP query-hotpluggable-cpus output
> >   * Desirable change, as now the QMP command will reflect what
> >     the guest is really seeing
> > 
> > Due to the _PXM and FDT changes, I have the same objection I have
> > for patch 4/5: if we are going to break user expectations when
> > using incomplete NUMA mappings, let's do that only once (when we
> > start rejecting incomplete NUMA mappings).
> 
> FDT/PXM change is bugfix to firmware that I insist on.
> Firmware should properly describe all CPUs instead of repeating
> partial CPU mapping nonsense.
> 
> Wrt _PXM change see
> (27111931 acpi: provide _PXM method for CPU devices if QEMU is started numa enabled)
> provided we keep 0 node fallback by dropping 4/5
> linux guest will behave the same as without _PXM.

If it is a bug fix, then I agree with it.  Also, this doesn't
change the topology of the virtual hardware, so it shouldn't
break user expectations anyway.

> 
> And again there wasn't any compat code around that
> and guest saw node-id change when it's been restarted.
> 
> Wrt FDT change, it's the same and I also talked with
> Drew about it, his reply was that for ARM target
> they don't care much about NUMA as not all pieces are
> there to make pinning work on host side yet.
> 
> PS:
> There wasn't any promise made to keep firmware side
> bug compatible or versioned so users should not have
> expectations about it in the first place.

You are right about this.

> 
> Pls don't push for bug compat stuff in the firmware
> as it will rise false expectations (at least until
> we decide to care about it by default).

I won't.  Patch 4/5 is different because it makes firmware
describe different hardware.  But this one is firmware-only, so
you are right.

> 
> (I'd use firmware compat stuff only in cases where
> it breaks guest horribly, i.e. crash/fail boot)
> 
> 
> > A few extra comments about the has_node_id checks below:
> > 
> > > ---
> > >  hw/arm/virt-acpi-build.c |  4 +---
> > >  hw/core/machine.c        | 16 ++++++++++------
> > >  hw/i386/acpi-build.c     |  3 +--
> > >  hw/i386/pc.c             |  4 +---
> > >  numa.c                   |  7 +------
> > >  5 files changed, 14 insertions(+), 20 deletions(-)
> > > 
> > > diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
> > > index e585206..977a794 100644
> > > --- a/hw/arm/virt-acpi-build.c
> > > +++ b/hw/arm/virt-acpi-build.c
> > > @@ -496,12 +496,10 @@ build_srat(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
> > >      srat->reserved1 = cpu_to_le32(1);
> > >  
> > >      for (i = 0; i < cpu_list->len; ++i) {
> > > -        int node_id = cpu_list->cpus[i].props.has_node_id ?
> > > -            cpu_list->cpus[i].props.node_id : 0;
> > >          core = acpi_data_push(table_data, sizeof(*core));
> > >          core->type = ACPI_SRAT_PROCESSOR_GICC;
> > >          core->length = sizeof(*core);
> > > -        core->proximity = cpu_to_le32(node_id);
> > > +        core->proximity = cpu_to_le32(cpu_list->cpus[i].props.node_id);
> > >          core->acpi_processor_uid = cpu_to_le32(i);
> > >          core->flags = cpu_to_le32(1);
> > >      }
> > > diff --git a/hw/core/machine.c b/hw/core/machine.c
> > > index aaf3cff..964b75d 100644
> > > --- a/hw/core/machine.c
> > > +++ b/hw/core/machine.c
> > > @@ -720,19 +720,23 @@ static void machine_numa_finish_init(MachineState *machine)
> > >          const CPUArchId *cpu_slot = &possible_cpus->cpus[i];
> > >  
> > >          if (!cpu_slot->props.has_node_id) {
> > > -            if (default_mapping) {
> > > -                /* fetch default mapping from board and enable it */
> > > -                CpuInstanceProperties props = cpu_slot->props;
> > > -                props.has_node_id = true;
> > > -                machine_set_cpu_numa_node(machine, &props, &error_fatal);
> > > -            } else {
> > > +            /* fetch default mapping from board and enable it */
> > > +            CpuInstanceProperties props = cpu_slot->props;
> > > +
> > > +            if (!default_mapping) {
> > >                  /* record slots with not set mapping,
> > >                   * TODO: make it hard error in future */
> > >                  char *cpu_str = cpu_slot_to_string(cpu_slot);
> > >                  g_string_append_printf(s, "%sCPU %d [%s]",
> > >                                         s->len ? ", " : "", i, cpu_str);
> > >                  g_free(cpu_str);
> > > +
> > > +                /* non mapped cpus used to fallback to node 0 */
> > > +                props.node_id = 0;
> > >              }
> > > +
> > > +            props.has_node_id = true;
> > > +            machine_set_cpu_numa_node(machine, &props, &error_fatal);
> > >          }
> > >      }
> > >      if (s->len) {
> > > diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
> > > index afcadac..873880d 100644
> > > --- a/hw/i386/acpi-build.c
> > > +++ b/hw/i386/acpi-build.c
> > > @@ -2335,8 +2335,7 @@ build_srat(GArray *table_data, BIOSLinker *linker, MachineState *machine)
> > >      srat->reserved1 = cpu_to_le32(1);
> > >  
> > >      for (i = 0; i < apic_ids->len; i++) {
> > > -        int node_id = apic_ids->cpus[i].props.has_node_id ?
> > > -            apic_ids->cpus[i].props.node_id : 0;
> > > +        int node_id = apic_ids->cpus[i].props.node_id;  
> > 
> > What about an assert(props.has_node_id) here?
> I considered it but dropped idea, as it's mostly useless
> since build_srat() is called only when numa is enabled.

I just suggested it because I would be more confident that the
new code that ensures has_node_id is set for all slots is working
and won't break if we refactor it.  But not a big deal.

> 
> and sticking asserts in every place just in case doesn't
> seem to me like good idea as it would lead to absurd 
> asserts after every line.
> 
> Considering the new code behaves the same in case !has_node_id
> I don't see any reason to put assert here, the same goes
> for 'Ditto" comment in the next hunk.
> 
> But if you insist I can' add assert().

I won't mind if you decide to not add it.

> 
[...]
> > >      }
> > >      for (i = 0; i < nb_numa_nodes; i++) {
> > >          numa_fw_cfg[pcms->apic_id_limit + 1 + i] =
> > > diff --git a/numa.c b/numa.c
> > > index e30702e..4ef6dea 100644
> > > --- a/numa.c
> > > +++ b/numa.c
> > > @@ -513,17 +513,12 @@ void numa_cpu_pre_plug(const CPUArchId *slot, DeviceState *dev, Error **errp)
> > >      int node_id = object_property_get_int(OBJECT(dev), "node-id", &error_abort);
> > >  
> > >      if (node_id == CPU_UNSET_NUMA_NODE_ID) {
> > > -        /* by default CPUState::numa_node was 0 if it wasn't set explicitly
> > > -         * TODO: make it error when incomplete numa mapping support is removed
> > > -         */
> > > -        node_id = 0;
> > > -
> > >          /* due to bug in libvirt, it doesn't pass node-id from props on
> > >           * device_add as expected, so we have to fix it up here */
> > >          if (slot->props.has_node_id) {
> > >              node_id = slot->props.node_id;
> > > +            object_property_set_int(OBJECT(dev), node_id, "node-id", errp);
> > >          }
> > > -        object_property_set_int(OBJECT(dev), node_id, "node-id", errp);  
> > 
> > Why not removing the has_node_id check completely, here?
> because numa_cpu_pre_plug() is called unconditionally
> As alternative, I can replace has_node_id check with assert
> and do following from callers:
> 
>  if (nb_numa_nodes) {
>      numa_cpu_pre_plug()
>  }
> 
> 
> > In case this patch get respun, I suggest also removing the
> > has_node_id check at: build_cpus_aml() _PXM code.
> that would be wrong, build_cpus_aml() is called for both
> numa and non numa use-cases, there shouldn't be _PXM
> object if numa is not enabled, hence the check.
> 
>  

Oh, I forgot about the non-NUMA cases.  You're right.
diff mbox

Patch

diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
index e585206..977a794 100644
--- a/hw/arm/virt-acpi-build.c
+++ b/hw/arm/virt-acpi-build.c
@@ -496,12 +496,10 @@  build_srat(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
     srat->reserved1 = cpu_to_le32(1);
 
     for (i = 0; i < cpu_list->len; ++i) {
-        int node_id = cpu_list->cpus[i].props.has_node_id ?
-            cpu_list->cpus[i].props.node_id : 0;
         core = acpi_data_push(table_data, sizeof(*core));
         core->type = ACPI_SRAT_PROCESSOR_GICC;
         core->length = sizeof(*core);
-        core->proximity = cpu_to_le32(node_id);
+        core->proximity = cpu_to_le32(cpu_list->cpus[i].props.node_id);
         core->acpi_processor_uid = cpu_to_le32(i);
         core->flags = cpu_to_le32(1);
     }
diff --git a/hw/core/machine.c b/hw/core/machine.c
index aaf3cff..964b75d 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -720,19 +720,23 @@  static void machine_numa_finish_init(MachineState *machine)
         const CPUArchId *cpu_slot = &possible_cpus->cpus[i];
 
         if (!cpu_slot->props.has_node_id) {
-            if (default_mapping) {
-                /* fetch default mapping from board and enable it */
-                CpuInstanceProperties props = cpu_slot->props;
-                props.has_node_id = true;
-                machine_set_cpu_numa_node(machine, &props, &error_fatal);
-            } else {
+            /* fetch default mapping from board and enable it */
+            CpuInstanceProperties props = cpu_slot->props;
+
+            if (!default_mapping) {
                 /* record slots with not set mapping,
                  * TODO: make it hard error in future */
                 char *cpu_str = cpu_slot_to_string(cpu_slot);
                 g_string_append_printf(s, "%sCPU %d [%s]",
                                        s->len ? ", " : "", i, cpu_str);
                 g_free(cpu_str);
+
+                /* non mapped cpus used to fallback to node 0 */
+                props.node_id = 0;
             }
+
+            props.has_node_id = true;
+            machine_set_cpu_numa_node(machine, &props, &error_fatal);
         }
     }
     if (s->len) {
diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index afcadac..873880d 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -2335,8 +2335,7 @@  build_srat(GArray *table_data, BIOSLinker *linker, MachineState *machine)
     srat->reserved1 = cpu_to_le32(1);
 
     for (i = 0; i < apic_ids->len; i++) {
-        int node_id = apic_ids->cpus[i].props.has_node_id ?
-            apic_ids->cpus[i].props.node_id : 0;
+        int node_id = apic_ids->cpus[i].props.node_id;
         uint32_t apic_id = apic_ids->cpus[i].arch_id;
 
         if (apic_id < 255) {
diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index cf09949..84f0a6f 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -788,9 +788,7 @@  static FWCfgState *bochs_bios_init(AddressSpace *as, PCMachineState *pcms)
     for (i = 0; i < cpus->len; i++) {
         unsigned int apic_id = cpus->cpus[i].arch_id;
         assert(apic_id < pcms->apic_id_limit);
-        if (cpus->cpus[i].props.has_node_id) {
-            numa_fw_cfg[apic_id + 1] = cpu_to_le64(cpus->cpus[i].props.node_id);
-        }
+        numa_fw_cfg[apic_id + 1] = cpu_to_le64(cpus->cpus[i].props.node_id);
     }
     for (i = 0; i < nb_numa_nodes; i++) {
         numa_fw_cfg[pcms->apic_id_limit + 1 + i] =
diff --git a/numa.c b/numa.c
index e30702e..4ef6dea 100644
--- a/numa.c
+++ b/numa.c
@@ -513,17 +513,12 @@  void numa_cpu_pre_plug(const CPUArchId *slot, DeviceState *dev, Error **errp)
     int node_id = object_property_get_int(OBJECT(dev), "node-id", &error_abort);
 
     if (node_id == CPU_UNSET_NUMA_NODE_ID) {
-        /* by default CPUState::numa_node was 0 if it wasn't set explicitly
-         * TODO: make it error when incomplete numa mapping support is removed
-         */
-        node_id = 0;
-
         /* due to bug in libvirt, it doesn't pass node-id from props on
          * device_add as expected, so we have to fix it up here */
         if (slot->props.has_node_id) {
             node_id = slot->props.node_id;
+            object_property_set_int(OBJECT(dev), node_id, "node-id", errp);
         }
-        object_property_set_int(OBJECT(dev), node_id, "node-id", errp);
     } else if (node_id != slot->props.node_id) {
         error_setg(errp, "node-id=%d must match numa node specified "
                    "with -numa option", node_id);