From patchwork Tue Mar 17 15:48:38 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Igor Mammedov X-Patchwork-Id: 451049 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 8E947140157 for ; Wed, 18 Mar 2015 02:49:16 +1100 (AEDT) Received: from localhost ([::1]:55618 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YXtk0-0003sF-L5 for incoming@patchwork.ozlabs.org; Tue, 17 Mar 2015 11:49:12 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:60635) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YXtjf-0003Ok-Gf for qemu-devel@nongnu.org; Tue, 17 Mar 2015 11:48:52 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1YXtja-0003kM-CS for qemu-devel@nongnu.org; Tue, 17 Mar 2015 11:48:51 -0400 Received: from mx1.redhat.com ([209.132.183.28]:55951) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YXtja-0003kD-56 for qemu-devel@nongnu.org; Tue, 17 Mar 2015 11:48:46 -0400 Received: from int-mx13.intmail.prod.int.phx2.redhat.com (int-mx13.intmail.prod.int.phx2.redhat.com [10.5.11.26]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id t2HFmhGb030115 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL) for ; Tue, 17 Mar 2015 11:48:44 -0400 Received: from dell-pet610-01.lab.eng.brq.redhat.com (dell-pet610-01.lab.eng.brq.redhat.com [10.34.42.20]) by int-mx13.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id t2HFmfaK021346; Tue, 17 Mar 2015 11:48:42 -0400 From: Igor Mammedov To: qemu-devel@nongnu.org Date: Tue, 17 Mar 2015 15:48:38 +0000 Message-Id: <1426607318-22728-1-git-send-email-imammedo@redhat.com> X-Scanned-By: MIMEDefang 2.68 on 10.5.11.26 X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x X-Received-From: 209.132.183.28 Cc: ehabkost@redhat.com Subject: [Qemu-devel] [PATCH for-2.3] numa: pc: fix default VCPU to node mapping X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org since commit dd0247e0 pc: acpi: mark all possible CPUs as enabled in SRAT Linux kernel actually tries to use CPU to Node mapping from QEMU provided SRAT table instead of discarding it, and that in some cases breaks build_sched_domains() which expects sane mapping where cores/threads belonging to the same socket are on the same NUMA node. With current default round-robin mapping of VCPUs to nodes guest ends-up with cores/threads belonging to the same socket being on different NUMA nodes. For example with following CLI: qemu-kvm -m 4G -smp 5,sockets=1,cores=4,threads=1,maxcpus=8 \ -numa node,nodeid=0 -numa node,nodeid=1 2.6.32 based kernels will hang on boot due to incorrectly build sched_group-s list in update_sd_lb_stats() so comment in QEMU justifying dumb default mapping: " guest OSes must cope with this anyway, because there are BIOSes out there in real machines which also use this scheme. " isn't really valid. Replacing default mapping withi a manual, where VCPUs belonging to the same socket are on the same NUMA node, fixes issue for guests which can't handle nonsense topology i.e. cnaging CLI to: -numa node,nodeid=0,cpus=0-3 -numa node,nodeid=1,cpus=4-7 So instead of simply scattering VCPUs around nodes, map the same socket VCPUs to the same NUMA node, which is what guest would expect from a sane hardware/BIOS. Signed-off-by: Igor Mammedov --- include/sysemu/cpus.h | 3 +++ numa.c | 14 ++++++++++---- stubs/Makefile.objs | 1 + stubs/qemu_cpu_socket_id_from_index.c | 6 ++++++ target-i386/cpu.c | 11 +++++++++++ 5 files changed, 31 insertions(+), 4 deletions(-) create mode 100644 stubs/qemu_cpu_socket_id_from_index.c diff --git a/include/sysemu/cpus.h b/include/sysemu/cpus.h index 3f162a9..aacabcb 100644 --- a/include/sysemu/cpus.h +++ b/include/sysemu/cpus.h @@ -1,5 +1,6 @@ #ifndef QEMU_CPUS_H #define QEMU_CPUS_H +#include "qemu-common.h" /* cpus.c */ void qemu_init_cpu_loop(void); @@ -18,6 +19,8 @@ void qtest_clock_warp(int64_t dest); /* vl.c */ extern int smp_cores; extern int smp_threads; + +unsigned qemu_cpu_socket_id_from_index(unsigned int cpu_index); #else /* *-user doesn't have configurable SMP topology */ #define smp_cores 1 diff --git a/numa.c b/numa.c index ffbec68..5297749 100644 --- a/numa.c +++ b/numa.c @@ -26,6 +26,7 @@ #include "exec/cpu-common.h" #include "qemu/bitmap.h" #include "qom/cpu.h" +#include "sysemu/cpus.h" #include "qemu/error-report.h" #include "include/exec/cpu-common.h" /* for RAM_ADDR_FMT */ #include "qapi-visit.h" @@ -233,13 +234,18 @@ void parse_numa_opts(void) break; } } - /* assigning the VCPUs round-robin is easier to implement, guest OSes - * must cope with this anyway, because there are BIOSes out there in - * real machines which also use this scheme. + /* Assign VCPUs from the same socket to the same node. + * Since mapping is arch dependent, target that care about + * correct mapping of VCPUs to node should implement + * qemu_cpu_socket_id_from_index() function that maps cpu_index to + * a socket #, for all other cases legacy round-robin mode + * will be used. */ if (i == nb_numa_nodes) { for (i = 0; i < max_cpus; i++) { - set_bit(i, numa_info[i % nb_numa_nodes].node_cpu); + unsigned socket_id = qemu_cpu_socket_id_from_index(i); + + set_bit(i, numa_info[socket_id % nb_numa_nodes].node_cpu); } } } diff --git a/stubs/Makefile.objs b/stubs/Makefile.objs index 8beff4c..86b8060 100644 --- a/stubs/Makefile.objs +++ b/stubs/Makefile.objs @@ -39,3 +39,4 @@ stub-obj-$(CONFIG_WIN32) += fd-register.o stub-obj-y += cpus.o stub-obj-y += kvm.o stub-obj-y += qmp_pc_dimm_device_list.o +stub-obj-y += qemu_cpu_socket_id_from_index.o diff --git a/stubs/qemu_cpu_socket_id_from_index.c b/stubs/qemu_cpu_socket_id_from_index.c new file mode 100644 index 0000000..3d8ea8b --- /dev/null +++ b/stubs/qemu_cpu_socket_id_from_index.c @@ -0,0 +1,6 @@ +#include "sysemu/cpus.h" + +unsigned qemu_cpu_socket_id_from_index(unsigned int cpu_index) +{ + return cpu_index; +} diff --git a/target-i386/cpu.c b/target-i386/cpu.c index ed7e5d5..7a7e236 100644 --- a/target-i386/cpu.c +++ b/target-i386/cpu.c @@ -47,6 +47,7 @@ #include "hw/xen/xen.h" #include "hw/i386/apic_internal.h" #endif +#include "hw/i386/topology.h" /* Cache topology CPUID constants: */ @@ -2822,6 +2823,16 @@ out: } } +#ifndef CONFIG_USER_ONLY +unsigned qemu_cpu_socket_id_from_index(unsigned int cpu_index) +{ + unsigned pkg_id, core_id, smt_id; + x86_topo_ids_from_idx(smp_cores, smp_threads, cpu_index, + &pkg_id, &core_id, &smt_id); + return pkg_id; +} +#endif + static void x86_cpu_initfn(Object *obj) { CPUState *cs = CPU(obj);