diff mbox

[RFC] spapr: Fix default NUMA node allocation for threads

Message ID 1441077316-24710-1-git-send-email-david@gibson.dropbear.id.au
State New
Headers show

Commit Message

David Gibson Sept. 1, 2015, 3:15 a.m. UTC
At present, if guest numa nodes are requested, but the cpus in each node
are not specified, spapr just uses the default behaviour or assigning each
vcpu round-robin to nodes.

If smp_threads != 1, that will assign adjacent threads in a core to
different NUMA nodes.  As well as being just weird, that's a configuration
that can't be represented in the device tree we give to the guest, which
means the guest and qemu end up with different ideas of the NUMA topology.

This patch implements mc->cpu_index_to_socket_id in the spapr code to
make sure vcpus get assigned to nodes only at the socket granularity.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 hw/ppc/spapr.c | 8 ++++++++
 1 file changed, 8 insertions(+)

The default NUMA allocation is pretty broken for any normal system,
but this at least fixes it for one more case.  This is already in my
spapr-next tree, but if I can get a Reviewed-by or two, it will be
ready for merge to mainline.

Comments

Alexey Kardashevskiy Sept. 2, 2015, 8:25 a.m. UTC | #1
On 09/01/2015 01:15 PM, David Gibson wrote:
> At present, if guest numa nodes are requested, but the cpus in each node
> are not specified, spapr just uses the default behaviour or assigning each
> vcpu round-robin to nodes.
>
> If smp_threads != 1, that will assign adjacent threads in a core to
> different NUMA nodes.  As well as being just weird, that's a configuration
> that can't be represented in the device tree we give to the guest, which
> means the guest and qemu end up with different ideas of the NUMA topology.
>
> This patch implements mc->cpu_index_to_socket_id in the spapr code to
> make sure vcpus get assigned to nodes only at the socket granularity.
>
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> ---
>   hw/ppc/spapr.c | 8 ++++++++
>   1 file changed, 8 insertions(+)
>
> The default NUMA allocation is pretty broken for any normal system,
> but this at least fixes it for one more case.  This is already in my
> spapr-next tree, but if I can get a Reviewed-by or two, it will be
> ready for merge to mainline.
>
>
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index bf0c64f..8c2b103 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -1820,6 +1820,13 @@ static void spapr_nmi(NMIState *n, int cpu_index, Error **errp)
>       }
>   }
>
> +static unsigned spapr_cpu_index_to_socket_id(unsigned cpu_index)
> +{
> +    /* Allocate to NUMA nodes on a "socket" basis (not that concept of
> +     * socket means much for the paravirtualized PAPR platform) */
> +    return cpu_index / smp_threads / smp_cores;



This bothers me as "ibm,chip-id" is calculated different in 
spapr_populate_cpu_dt() and your schema gives different socket numbers for 
weird cases like -smp 16,sockets=3,cores=4,threads=2


In general, I do not really understand why there is "sockets" parameter in 
QEMU at all...






> +}
> +
>   static void spapr_machine_class_init(ObjectClass *oc, void *data)
>   {
>       MachineClass *mc = MACHINE_CLASS(oc);
> @@ -1836,6 +1843,7 @@ static void spapr_machine_class_init(ObjectClass *oc, void *data)
>       mc->kvm_type = spapr_kvm_type;
>       mc->has_dynamic_sysbus = true;
>       mc->pci_allow_0_address = true;
> +    mc->cpu_index_to_socket_id = spapr_cpu_index_to_socket_id;
>
>       fwc->get_dev_path = spapr_get_fw_dev_path;
>       nc->nmi_monitor_handler = spapr_nmi;
>
diff mbox

Patch

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index bf0c64f..8c2b103 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1820,6 +1820,13 @@  static void spapr_nmi(NMIState *n, int cpu_index, Error **errp)
     }
 }
 
+static unsigned spapr_cpu_index_to_socket_id(unsigned cpu_index)
+{
+    /* Allocate to NUMA nodes on a "socket" basis (not that concept of
+     * socket means much for the paravirtualized PAPR platform) */
+    return cpu_index / smp_threads / smp_cores;
+}
+
 static void spapr_machine_class_init(ObjectClass *oc, void *data)
 {
     MachineClass *mc = MACHINE_CLASS(oc);
@@ -1836,6 +1843,7 @@  static void spapr_machine_class_init(ObjectClass *oc, void *data)
     mc->kvm_type = spapr_kvm_type;
     mc->has_dynamic_sysbus = true;
     mc->pci_allow_0_address = true;
+    mc->cpu_index_to_socket_id = spapr_cpu_index_to_socket_id;
 
     fwc->get_dev_path = spapr_get_fw_dev_path;
     nc->nmi_monitor_handler = spapr_nmi;