From patchwork Fri Nov 22 07:48:21 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tao Xu X-Patchwork-Id: 1199334 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=nongnu.org (client-ip=209.51.188.17; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=intel.com Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 47K8013cGPz9sPv for ; Fri, 22 Nov 2019 18:56:49 +1100 (AEDT) Received: from localhost ([::1]:48006 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1iY3o6-0005UF-Sc for incoming@patchwork.ozlabs.org; Fri, 22 Nov 2019 02:56:46 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]:39583) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1iY3gd-0005lw-Tb for qemu-devel@nongnu.org; Fri, 22 Nov 2019 02:49:05 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1iY3gb-0002t1-S0 for qemu-devel@nongnu.org; Fri, 22 Nov 2019 02:49:03 -0500 Received: from mga18.intel.com ([134.134.136.126]:36768) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1iY3gb-0002kF-Ea for qemu-devel@nongnu.org; Fri, 22 Nov 2019 02:49:01 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga003.jf.intel.com ([10.7.209.27]) by orsmga106.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 21 Nov 2019 23:49:01 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.69,229,1571727600"; d="scan'208";a="210178551" Received: from tao-optiplex-7060.sh.intel.com ([10.239.159.36]) by orsmga003.jf.intel.com with ESMTP; 21 Nov 2019 23:48:57 -0800 From: Tao Xu To: mst@redhat.com, imammedo@redhat.com, eblake@redhat.com, ehabkost@redhat.com, marcel.apfelbaum@gmail.com, armbru@redhat.com, sw@weilnetz.de, mdroth@linux.vnet.ibm.com, thuth@redhat.com, lvivier@redhat.com Subject: [PATCH v17 09/14] numa: Extend CLI to provide memory side cache information Date: Fri, 22 Nov 2019 15:48:21 +0800 Message-Id: <20191122074826.1373-10-tao3.xu@intel.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20191122074826.1373-1-tao3.xu@intel.com> References: <20191122074826.1373-1-tao3.xu@intel.com> MIME-Version: 1.0 X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 134.134.136.126 X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: jingqi.liu@intel.com, tao3.xu@intel.com, fan.du@intel.com, qemu-devel@nongnu.org, Daniel Black , jonathan.cameron@huawei.com Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-devel" From: Liu Jingqi Add -numa hmat-cache option to provide Memory Side Cache Information. These memory attributes help to build Memory Side Cache Information Structure(s) in ACPI Heterogeneous Memory Attribute Table (HMAT). Reviewed-by: Daniel Black Signed-off-by: Liu Jingqi Signed-off-by: Tao Xu --- Changes in v17: - Use NumaHmatCacheOptions to replace HMAT_Cache_Info (Igor) - Add check for unordered cache level input (Igor) Changes in v16: - Add cross check with hmat_lb data (Igor) - Drop total_levels in struct HMAT_Cache_Info (Igor) - Correct the error table number (Igor) Changes in v15: - Change the QAPI version tag to 5.0 (Eric) --- hw/core/numa.c | 78 +++++++++++++++++++++++++++++++++++++++++++ include/sysemu/numa.h | 5 +++ qapi/machine.json | 78 +++++++++++++++++++++++++++++++++++++++++-- qemu-options.hx | 16 +++++++-- 4 files changed, 173 insertions(+), 4 deletions(-) diff --git a/hw/core/numa.c b/hw/core/numa.c index 70bc8a1081..7b1a9531d2 100644 --- a/hw/core/numa.c +++ b/hw/core/numa.c @@ -363,6 +363,71 @@ void parse_numa_hmat_lb(NumaState *numa_state, NumaHmatLBOptions *node, g_array_append_val(hmat_lb->list, lb_data); } +void parse_numa_hmat_cache(MachineState *ms, NumaHmatCacheOptions *node, + Error **errp) +{ + int nb_numa_nodes = ms->numa_state->num_nodes; + NodeInfo *numa_info = ms->numa_state->nodes; + NumaHmatCacheOptions *hmat_cache = NULL; + + if (node->node_id >= nb_numa_nodes) { + error_setg(errp, "Invalid node-id=%" PRIu32 ", it should be less " + "than %d", node->node_id, nb_numa_nodes); + return; + } + + if (numa_info[node->node_id].lb_info_provided != (BIT(0) | BIT(1))) { + error_setg(errp, "The latency and bandwidth information of " + "node-id=%" PRIu32 " should be provided before memory side " + "cache attributes", node->node_id); + return; + } + + if (node->level >= HMAT_LB_LEVELS) { + error_setg(errp, "Invalid level=%" PRIu8 ", it should be less than or " + "equal to %d", node->level, HMAT_LB_LEVELS - 1); + return; + } + assert(node->assoc < HMAT_CACHE_ASSOCIATIVITY__MAX); + assert(node->policy < HMAT_CACHE_WRITE_POLICY__MAX); + if (ms->numa_state->hmat_cache[node->node_id][node->level]) { + error_setg(errp, "Duplicate configuration of the side cache for " + "node-id=%" PRIu32 " and level=%" PRIu8, + node->node_id, node->level); + return; + } + + if ((node->level > 1) && + ms->numa_state->hmat_cache[node->node_id][node->level - 1] && + (node->size >= + ms->numa_state->hmat_cache[node->node_id][node->level - 1]->size)) { + error_setg(errp, "Invalid size=%" PRIu64 ", the size of level=%" PRIu8 + " should be small than the size(%" PRIu64 ") of " + "level=%" PRIu8, node->size, node->level, + ms->numa_state->hmat_cache[node->node_id] + [node->level - 1]->size, + node->level - 1); + return; + } + + if ((node->level < HMAT_LB_LEVELS - 1) && + ms->numa_state->hmat_cache[node->node_id][node->level + 1] && + (node->size <= + ms->numa_state->hmat_cache[node->node_id][node->level + 1]->size)) { + error_setg(errp, "Invalid size=%" PRIu64 ", the size of level=%" PRIu8 + " should be larger than the size(%" PRIu64 ") of " + "level=%" PRIu8, node->size, node->level, + ms->numa_state->hmat_cache[node->node_id] + [node->level + 1]->size, + node->level + 1); + return; + } + + hmat_cache = g_malloc0(sizeof(*hmat_cache)); + memcpy(hmat_cache, node, sizeof(*hmat_cache)); + ms->numa_state->hmat_cache[node->node_id][node->level] = hmat_cache; +} + void set_numa_options(MachineState *ms, NumaOptions *object, Error **errp) { Error *err = NULL; @@ -414,6 +479,19 @@ void set_numa_options(MachineState *ms, NumaOptions *object, Error **errp) goto end; } break; + case NUMA_OPTIONS_TYPE_HMAT_CACHE: + if (!ms->numa_state->hmat_enabled) { + error_setg(errp, "ACPI Heterogeneous Memory Attribute Table " + "(HMAT) is disabled, enable it with -machine hmat=on " + "before using any of hmat specific options"); + return; + } + + parse_numa_hmat_cache(ms, &object->u.hmat_cache, &err); + if (err) { + goto end; + } + break; default: abort(); } diff --git a/include/sysemu/numa.h b/include/sysemu/numa.h index 70f93c83d7..ba693cc80b 100644 --- a/include/sysemu/numa.h +++ b/include/sysemu/numa.h @@ -91,6 +91,9 @@ struct NumaState { /* NUMA nodes HMAT Locality Latency and Bandwidth Information */ HMAT_LB_Info *hmat_lb[HMAT_LB_LEVELS][HMAT_LB_TYPES]; + + /* Memory Side Cache Information Structure */ + NumaHmatCacheOptions *hmat_cache[MAX_NODES][HMAT_LB_LEVELS]; }; typedef struct NumaState NumaState; @@ -98,6 +101,8 @@ void set_numa_options(MachineState *ms, NumaOptions *object, Error **errp); void parse_numa_opts(MachineState *ms); void parse_numa_hmat_lb(NumaState *numa_state, NumaHmatLBOptions *node, Error **errp); +void parse_numa_hmat_cache(MachineState *ms, NumaHmatCacheOptions *node, + Error **errp); void numa_complete_configuration(MachineState *ms); void query_numa_node_mem(NumaNodeMem node_mem[], MachineState *ms); extern QemuOptsList qemu_numa_opts; diff --git a/qapi/machine.json b/qapi/machine.json index 67f5910400..999235bc1b 100644 --- a/qapi/machine.json +++ b/qapi/machine.json @@ -428,10 +428,12 @@ # # @hmat-lb: memory latency and bandwidth information (Since: 5.0) # +# @hmat-cache: memory side cache information (Since: 5.0) +# # Since: 2.1 ## { 'enum': 'NumaOptionsType', - 'data': [ 'node', 'dist', 'cpu', 'hmat-lb' ] } + 'data': [ 'node', 'dist', 'cpu', 'hmat-lb', 'hmat-cache' ] } ## # @NumaOptions: @@ -447,7 +449,8 @@ 'node': 'NumaNodeOptions', 'dist': 'NumaDistOptions', 'cpu': 'NumaCpuOptions', - 'hmat-lb': 'NumaHmatLBOptions' }} + 'hmat-lb': 'NumaHmatLBOptions', + 'hmat-cache': 'NumaHmatCacheOptions' }} ## # @NumaNodeOptions: @@ -647,6 +650,77 @@ '*latency': 'time', '*bandwidth': 'size' }} +## +# @HmatCacheAssociativity: +# +# Cache associativity in the Memory Side Cache +# Information Structure of HMAT +# +# For more information of @HmatCacheAssociativity see +# the chapter 5.2.27.5: Table 5-147 of ACPI 6.3 spec. +# +# @none: None +# +# @direct: Direct Mapped +# +# @complex: Complex Cache Indexing (implementation specific) +# +# Since: 5.0 +## +{ 'enum': 'HmatCacheAssociativity', + 'data': [ 'none', 'direct', 'complex' ] } + +## +# @HmatCacheWritePolicy: +# +# Cache write policy in the Memory Side Cache +# Information Structure of HMAT +# +# For more information of @HmatCacheWritePolicy see +# the chapter 5.2.27.5: Table 5-147: Field "Cache Attributes" of ACPI 6.3 spec. +# +# @none: None +# +# @write-back: Write Back (WB) +# +# @write-through: Write Through (WT) +# +# Since: 5.0 +## +{ 'enum': 'HmatCacheWritePolicy', + 'data': [ 'none', 'write-back', 'write-through' ] } + +## +# @NumaHmatCacheOptions: +# +# Set the memory side cache information for a given memory domain. +# +# For more information of @NumaHmatCacheOptions see +# the chapter 5.2.27.5: Table 5-147: Field "Cache Attributes" of ACPI 6.3 spec. +# +# @node-id: the memory proximity domain to which the memory belongs. +# +# @size: the size of memory side cache in bytes. +# +# @level: the cache level described in this structure. +# +# @assoc: the cache associativity, none/direct-mapped/complex(complex cache indexing). +# +# @policy: the write policy, none/write-back/write-through. +# +# @line: the cache Line size in bytes. +# +# Since: 5.0 +## +{ 'struct': 'NumaHmatCacheOptions', + 'data': { + 'node-id': 'uint32', + 'size': 'size', + 'level': 'uint8', + 'assoc': 'HmatCacheAssociativity', + 'policy': 'HmatCacheWritePolicy', + 'line': 'uint16' }} + ## # @HostMemPolicy: # diff --git a/qemu-options.hx b/qemu-options.hx index 929d275450..ad0e5aa190 100644 --- a/qemu-options.hx +++ b/qemu-options.hx @@ -169,7 +169,8 @@ DEF("numa", HAS_ARG, QEMU_OPTION_numa, "-numa node[,memdev=id][,cpus=firstcpu[-lastcpu]][,nodeid=node][,initiator=node]\n" "-numa dist,src=source,dst=destination,val=distance\n" "-numa cpu,node-id=node[,socket-id=x][,core-id=y][,thread-id=z]\n" - "-numa hmat-lb,initiator=node,target=node,hierarchy=memory|first-level|second-level|third-level,data-type=access-latency|read-latency|write-latency[,latency=lat][,bandwidth=bw]\n", + "-numa hmat-lb,initiator=node,target=node,hierarchy=memory|first-level|second-level|third-level,data-type=access-latency|read-latency|write-latency[,latency=lat][,bandwidth=bw]\n" + "-numa hmat-cache,node-id=node,size=size,level=level[,assoc=none|direct|complex][,policy=none|write-back|write-through][,line=size]\n", QEMU_ARCH_ALL) STEXI @item -numa node[,mem=@var{size}][,cpus=@var{firstcpu}[-@var{lastcpu}]][,nodeid=@var{node}][,initiator=@var{initiator}] @@ -177,6 +178,7 @@ STEXI @itemx -numa dist,src=@var{source},dst=@var{destination},val=@var{distance} @itemx -numa cpu,node-id=@var{node}[,socket-id=@var{x}][,core-id=@var{y}][,thread-id=@var{z}] @itemx -numa hmat-lb,initiator=@var{node},target=@var{node},hierarchy=@var{hierarchy},data-type=@var{tpye}[,latency=@var{lat}][,bandwidth=@var{bw}] +@itemx -numa hmat-cache,node-id=@var{node},size=@var{size},level=@var{level}[,assoc=@var{str}][,policy=@var{str}][,line=@var{size}] @findex -numa Define a NUMA node and assign RAM and VCPUs to it. Set the NUMA distance from a source node to a destination node. @@ -282,11 +284,19 @@ Note that if NUM is 0, means the corresponding latency or bandwidth information is not provided. And if input numbers without any unit, the latency unit will be 'ns' and the bandwidth will be B/s. +In @samp{hmat-cache} option, @var{node-id} is the NUMA-id of the memory belongs. +@var{size} is the size of memory side cache in bytes. @var{level} is the cache +level described in this structure. @var{assoc} is the cache associativity, +the possible value is 'none/direct(direct-mapped)/complex(complex cache indexing)'. +@var{policy} is the write policy. @var{line} is the cache Line size in bytes. + For example, the following options describe 2 NUMA nodes. Node 0 has 2 cpus and a ram, node 1 has only a ram. The processors in node 0 access memory in node 0 with access-latency 5 nanoseconds, access-bandwidth is 200 MB/s; The processors in NUMA node 0 access memory in NUMA node 1 with access-latency 10 nanoseconds, access-bandwidth is 100 MB/s. +And for memory side cache information, NUMA node 0 and 1 both have 1 level memory +cache, size is 10KB, policy is write-back, the cache Line size is 8 bytes: @example -machine hmat=on \ -m 2G \ @@ -300,7 +310,9 @@ nanoseconds, access-bandwidth is 100 MB/s. -numa hmat-lb,initiator=0,target=0,hierarchy=memory,data-type=access-latency,latency=5ns \ -numa hmat-lb,initiator=0,target=0,hierarchy=memory,data-type=access-bandwidth,bandwidth=200M \ -numa hmat-lb,initiator=0,target=1,hierarchy=memory,data-type=access-latency,latency=10ns \ --numa hmat-lb,initiator=0,target=1,hierarchy=memory,data-type=access-bandwidth,bandwidth=100M +-numa hmat-lb,initiator=0,target=1,hierarchy=memory,data-type=access-bandwidth,bandwidth=100M \ +-numa hmat-cache,node-id=0,size=10K,level=1,assoc=direct,policy=write-back,line=8 \ +-numa hmat-cache,node-id=1,size=10K,level=1,assoc=direct,policy=write-back,line=8 @end example ETEXI