From patchwork Wed Mar 26 10:36:59 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hu Tao X-Patchwork-Id: 333837 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 60BC5140090 for ; Wed, 26 Mar 2014 23:09:34 +1100 (EST) Received: from localhost ([::1]:46890 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WSljm-0002AB-Qe for incoming@patchwork.ozlabs.org; Wed, 26 Mar 2014 07:11:14 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:37737) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WSliG-0001DY-IU for qemu-devel@nongnu.org; Wed, 26 Mar 2014 07:09:46 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1WSliA-0002hI-FX for qemu-devel@nongnu.org; Wed, 26 Mar 2014 07:09:40 -0400 Received: from [59.151.112.132] (port=60204 helo=heian.cn.fujitsu.com) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WSli9-0002gs-8r for qemu-devel@nongnu.org; Wed, 26 Mar 2014 07:09:34 -0400 X-IronPort-AV: E=Sophos;i="4.97,734,1389715200"; d="scan'208";a="28487324" Received: from unknown (HELO edo.cn.fujitsu.com) ([10.167.33.5]) by heian.cn.fujitsu.com with ESMTP; 26 Mar 2014 18:34:32 +0800 Received: from G08CNEXCHPEKD03.g08.fujitsu.local (localhost.localdomain [127.0.0.1]) by edo.cn.fujitsu.com (8.14.3/8.13.1) with ESMTP id s2QAapvH003095; Wed, 26 Mar 2014 18:36:53 +0800 Received: from G08CNEXMBPEKD03.g08.fujitsu.local ([10.167.33.86]) by G08CNEXCHPEKD03.g08.fujitsu.local ([10.167.33.85]) with mapi id 14.03.0146.002; Wed, 26 Mar 2014 18:37:00 +0800 From: "hutao@cn.fujitsu.com" To: "qemu-devel@nongnu.org" Thread-Topic: [PATCH v3 15/34] numa: add -numa node,memdev= option Thread-Index: AQHPSN9UQIFO2jKOL0GIE4BCwYLq+g== Date: Wed, 26 Mar 2014 10:36:59 +0000 Message-ID: References: In-Reply-To: Accept-Language: zh-CN, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.167.226.102] MIME-Version: 1.0 X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 59.151.112.132 Cc: "ehabkost@redhat.com" , "imammedo@redhat.com" , "mtosatti@redhat.com" , Paolo Bonzini , "a.motakis@virtualopensystems.com" , "gaowanlong@cn.fujitsu.com" Subject: [Qemu-devel] [PATCH v3 15/34] numa: add -numa node,memdev= option X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org From: Paolo Bonzini This option provides the infrastructure for binding guest NUMA nodes to host NUMA nodes. For example: -object memory-ram,size=1024M,policy=bind,host-nodes=0,id=ram-node0 \ -numa node,nodeid=0,cpus=0,memdev=ram-node0 \ -object memory-ram,size=1024M,policy=interleave,host-nodes=1-3,id=ram-node1 \ -numa node,nodeid=1,cpus=1,memdev=ram-node1 The option replaces "-numa node,mem=". Signed-off-by: Paolo Bonzini --- include/sysemu/sysemu.h | 1 + numa.c | 62 +++++++++++++++++++++++++++++++++++++++++++++++-- qapi-schema.json | 11 ++++++--- qemu-options.hx | 12 ++++++---- 4 files changed, 77 insertions(+), 9 deletions(-) diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h index caf88dd..1e141e3 100644 --- a/include/sysemu/sysemu.h +++ b/include/sysemu/sysemu.h @@ -147,6 +147,7 @@ extern int nb_numa_nodes; typedef struct node_info { uint64_t node_mem; DECLARE_BITMAP(node_cpu, MAX_CPUMASK_BITS); + struct HostMemoryBackend *node_memdev; } NodeInfo; extern NodeInfo numa_info[MAX_NODES]; void set_numa_nodes(void); diff --git a/numa.c b/numa.c index bcd7b04..b9850d7 100644 --- a/numa.c +++ b/numa.c @@ -34,6 +34,7 @@ #include "qapi/dealloc-visitor.h" #include "qapi/qmp/qerror.h" #include "hw/boards.h" +#include "sysemu/hostmem.h" QemuOptsList qemu_numa_opts = { .name = "numa", @@ -42,6 +43,8 @@ QemuOptsList qemu_numa_opts = { .desc = { { 0 } } /* validated with OptsVisitor */ }; +static int have_memdevs = -1; + static void numa_node_parse(NumaNodeOptions *node, QemuOpts *opts, Error **errp) { uint16_t nodenr; @@ -68,6 +71,20 @@ static void numa_node_parse(NumaNodeOptions *node, QemuOpts *opts, Error **errp) bitmap_set(numa_info[nodenr].node_cpu, cpus->value, 1); } + if (node->has_mem && node->has_memdev) { + error_setg(errp, "qemu: cannot specify both mem= and memdev=\n"); + return; + } + + if (have_memdevs == -1) { + have_memdevs = node->has_memdev; + } + if (node->has_memdev != have_memdevs) { + error_setg(errp, "qemu: memdev option must be specified for either " + "all or no nodes\n"); + return; + } + if (node->has_mem) { uint64_t mem_size = node->mem; const char *mem_str = qemu_opt_get(opts, "mem"); @@ -77,6 +94,18 @@ static void numa_node_parse(NumaNodeOptions *node, QemuOpts *opts, Error **errp) } numa_info[nodenr].node_mem = mem_size; } + if (node->has_memdev) { + Object *o; + o = object_resolve_path_type(node->memdev, TYPE_MEMORY_BACKEND, NULL); + if (!o) { + error_setg(errp, "memdev=%s is ambiguous", node->memdev); + return; + } + + object_ref(o); + numa_info[nodenr].node_mem = object_property_get_int(o, "size", NULL); + numa_info[nodenr].node_memdev = MEMORY_BACKEND(o); + } } int numa_init_func(QemuOpts *opts, void *opaque) @@ -196,10 +225,39 @@ void set_numa_modes(void) } } +static void allocate_system_memory_nonnuma(MemoryRegion *mr, Object *owner, + const char *name, + uint64_t ram_size) +{ + memory_region_init_ram(mr, owner, name, ram_size); + vmstate_register_ram_global(mr); +} + void memory_region_allocate_system_memory(MemoryRegion *mr, Object *owner, const char *name, uint64_t ram_size) { - memory_region_init_ram(mr, owner, name, ram_size); - vmstate_register_ram_global(mr); + uint64_t addr = 0; + int i; + + if (nb_numa_nodes == 0 || !have_memdevs) { + allocate_system_memory_nonnuma(mr, owner, name, ram_size); + return; + } + + memory_region_init(mr, owner, name, ram_size); + for (i = 0; i < nb_numa_nodes; i++) { + Error *local_err = NULL; + uint64_t size = numa_info[i].node_mem; + HostMemoryBackend *backend = numa_info[i].node_memdev; + MemoryRegion *seg = host_memory_backend_get_memory(backend, &local_err); + if (local_err) { + qerror_report_err(local_err); + exit(1); + } + + memory_region_add_subregion(mr, addr, seg); + vmstate_register_ram_global(seg); + addr += size; + } } diff --git a/qapi-schema.json b/qapi-schema.json index 8451d15..a8c16bc 100644 --- a/qapi-schema.json +++ b/qapi-schema.json @@ -4708,8 +4708,12 @@ # @cpus: #optional VCPUs belonging to this node (assign VCPUS round-robin # if omitted) # -# @mem: #optional memory size of this node (equally divide total memory among -# nodes if omitted) +# @mem: #optional memory size of this node; mutually exclusive with @memdev. +# Equally divide total memory among nodes if both @mem and @memdev are +# omitted. +# +# @memdev: #optional memory backend object. If specified for one node, +# it must be specified for all nodes. # # Since: 2.1 ## @@ -4717,4 +4721,5 @@ 'data': { '*nodeid': 'uint16', '*cpus': ['uint16'], - '*mem': 'size' }} + '*mem': 'size', + '*memdev': 'str' }} diff --git a/qemu-options.hx b/qemu-options.hx index 30faa0e..9f54b63 100644 --- a/qemu-options.hx +++ b/qemu-options.hx @@ -95,16 +95,20 @@ specifies the maximum number of hotpluggable CPUs. ETEXI DEF("numa", HAS_ARG, QEMU_OPTION_numa, - "-numa node[,mem=size][,cpus=cpu[-cpu]][,nodeid=node]\n", QEMU_ARCH_ALL) + "-numa node[,mem=size][,memdev=id][,cpus=cpu[-cpu]][,nodeid=node]\n", QEMU_ARCH_ALL) STEXI -@item -numa node[,mem=@var{size}][,cpus=@var{cpu[-cpu]}][,nodeid=@var{node}] +@item -numa node[,mem=@var{size}][,memdev=@var{id}][,cpus=@var{cpu[-cpu]}][,nodeid=@var{node}] @findex -numa -Simulate a multi node NUMA system. If @samp{mem} +Simulate a multi node NUMA system. If @samp{mem}, @samp{memdev} and @samp{cpus} are omitted, resources are split equally. Also, note that the -@option{numa} option doesn't allocate any of the specified resources. That is, it just assigns existing resources to NUMA nodes. This means that one still has to use the @option{-m}, @option{-smp} options -to respectively allocate RAM and vCPUs. +to respectively allocate RAM and vCPUs, and possibly @option{-object} +to specify the memory backend for the @samp{memdev} suboption. + +@samp{mem} and @samp{memdev} are mutually exclusive. Furthermore, if one +node uses @samp{memdev}, all of them have to use it. ETEXI DEF("add-fd", HAS_ARG, QEMU_OPTION_add_fd,