diff mbox series

[RESEND,v8,09/11] numa: Extend the CLI to provide memory latency and bandwidth information

Message ID 20190731011209.22538-10-tao3.xu@intel.com
State New
Headers show
Series Build ACPI Heterogeneous Memory Attribute Table (HMAT) | expand

Commit Message

Tao Xu July 31, 2019, 1:12 a.m. UTC
From: Liu Jingqi <jingqi.liu@intel.com>

Add -numa hmat-lb option to provide System Locality Latency and
Bandwidth Information. These memory attributes help to build
System Locality Latency and Bandwidth Information Structure(s)
in ACPI Heterogeneous Memory Attribute Table (HMAT).

Signed-off-by: Liu Jingqi <jingqi.liu@intel.com>
Signed-off-by: Tao Xu <tao3.xu@intel.com>
---

No changes in v8.
---
 hw/core/numa.c        | 127 ++++++++++++++++++++++++++++++++++++++++++
 include/sysemu/numa.h |   2 +
 qapi/machine.json     | 100 ++++++++++++++++++++++++++++++++-
 qemu-options.hx       |  45 ++++++++++++++-
 4 files changed, 271 insertions(+), 3 deletions(-)

Comments

Daniel Black Aug. 6, 2019, 11:14 p.m. UTC | #1
Liu Jingqi, Tao Xu,

Apologies to the late response on a patch on what is already a v8 patch.

The specification of latency and bandwidth is very much following the
ACPI specification.

For a qemu interface I think this should be in more human measurements
(time for latency and a bandwidth rate for the bandwidth).

e.g rather than
 -object memory-backend-ram,size=64M,id=m0
 -object memory-backend-ram,size=64M,id=m1geosync
 -numa node,nodeid=0,memdev=m0
 -numa node,nodeid=1,memdev=m1geosync,initiator=0
 -numa cpu,node-id=0,socket-id=0
 -numa cpu,node-id=0,socket-id=1
 -numa
hmat-lb,initiator=0,target=0,hierarchy=memory,data-type=access-latency,base-lat=1000,latency=5
 -numa
hmat-lb,initiator=0,target=0,hierarchy=memory,data-type=access-bandwidth,base-bw=20,bandwidth=5
 -numa
hmat-lb,initiator=0,target=1,hierarchy=memory,data-type=access-latency,base-lat=100,latency=10
 -numa
 hmat-lb,initiator=0,target=1,hierarchy=memory,data-type=access-bandwidth,base-bw=20,bandwidth=10


Suggest a form like:

 -numa
hmat-lb,initiator=0,target=0,hierarchy=memory,data-type=access-latency,latency=NUM[fpnm[s]]
 -numa
hmat-lb,initiator=0,target=0,hierarchy=memory,data-type=access-bandwidth,bandwidth=NUM[KMGP][Bb[ps]]

So:

 -numa
hmat-lb,initiator=0,target=0,hierarchy=memory,data-type=access-latency,latency=5ns
 -numa
hmat-lb,initiator=0,target=0,hierarchy=memory,data-type=access-bandwidth,bandwidth=5Gb


So this would remove the base-[lt,bw] settings and compute those internally in qemu based on latency/bandwidth specified with real units.

Also note the linux kernel HMAT latency display doesn't match up with the parameters passed in this patch series. Not sure which is at fault.

Test and results:

x86_64-softmmu/qemu-system-x86_64 -machine pc -nographic   -smp
2,sockets=2 -m 128M,slots=2,maxmem=1G
-kernel /home/dan/repos/linux/vmlinux  -append "console=ttyS0"
-object memory-backend-ram,size=64M,id=m0 -object
memory-backend-ram,size=64M,id=m1geosync -numa node,nodeid=0,memdev=m0
-numa node,nodeid=1,memdev=m1geosync,initiator=0 -numa
cpu,node-id=0,socket-id=0 -numa cpu,node-id=0,socket-id=1 -numa
hmat-lb,initiator=0,target=0,hierarchy=memory,data-type=access-latency,base-lat=1000,latency=5
-numa
hmat-lb,initiator=0,target=0,hierarchy=memory,data-type=access-bandwidth,base-bw=20,bandwidth=5
-numa
hmat-lb,initiator=0,target=1,hierarchy=memory,data-type=access-latency,base-lat=100,latency=10
-numa
hmat-lb,initiator=0,target=1,hierarchy=memory,data-type=access-bandwidth,base-bw=20,bandwidth=10

The Booting from ROM..[    0.000000] Linux version 5.3.0-rc2+
(dan@volution) (gcc version 9.1.1 20190503 (Red Hat 9.1.1-1) (GCC)) #21
SMP Tue Aug 6 17:15:49 AEST 2019


[    0.419303] HMAT: Memory Flags:0001 Processor Domain:0 Memory Domain:0
[    0.419648] HMAT: Memory Flags:0001 Processor Domain:0 Memory Domain:1
[    0.419956] HMAT: Locality: Flags:00 Type:Access Latency Initiator Domains:1 Target Domains:2 Base:1000
[    0.420527]   Initiator-Target[0-0]:5 nsec
[    0.420791]   Initiator-Target[0-1]:10 nsec
[    0.421068] HMAT: Locality: Flags:00 Type:Access Bandwidth Initiator Domains:1 Target Domains:2 Base:20
[    0.421447]   Initiator-Target[0-0]:100 MB/s
[    0.421635]   Initiator-Target[0-1]:200 MB/s
Tao Xu Aug. 7, 2019, 1:25 a.m. UTC | #2
On 8/7/2019 7:14 AM, Daniel Black wrote:
> 
> Liu Jingqi, Tao Xu,
> 
> Apologies to the late response on a patch on what is already a v8 patch.
> 
> The specification of latency and bandwidth is very much following the
> ACPI specification.
> 
> For a qemu interface I think this should be in more human measurements
> (time for latency and a bandwidth rate for the bandwidth).
> 
> e.g rather than
>   -object memory-backend-ram,size=64M,id=m0
>   -object memory-backend-ram,size=64M,id=m1geosync
>   -numa node,nodeid=0,memdev=m0
>   -numa node,nodeid=1,memdev=m1geosync,initiator=0
>   -numa cpu,node-id=0,socket-id=0
>   -numa cpu,node-id=0,socket-id=1
>   -numa
> hmat-lb,initiator=0,target=0,hierarchy=memory,data-type=access-latency,base-lat=1000,latency=5
>   -numa
> hmat-lb,initiator=0,target=0,hierarchy=memory,data-type=access-bandwidth,base-bw=20,bandwidth=5
>   -numa
> hmat-lb,initiator=0,target=1,hierarchy=memory,data-type=access-latency,base-lat=100,latency=10
>   -numa
>   hmat-lb,initiator=0,target=1,hierarchy=memory,data-type=access-bandwidth,base-bw=20,bandwidth=10
> 
> 
> Suggest a form like:
> 
>   -numa
> hmat-lb,initiator=0,target=0,hierarchy=memory,data-type=access-latency,latency=NUM[fpnm[s]]
>   -numa
> hmat-lb,initiator=0,target=0,hierarchy=memory,data-type=access-bandwidth,bandwidth=NUM[KMGP][Bb[ps]]
> 
> So:
> 
>   -numa
> hmat-lb,initiator=0,target=0,hierarchy=memory,data-type=access-latency,latency=5ns
>   -numa
> hmat-lb,initiator=0,target=0,hierarchy=memory,data-type=access-bandwidth,bandwidth=5Gb
> 
> 
> So this would remove the base-[lt,bw] settings and compute those internally in qemu based on latency/bandwidth specified with real units.
> 
> Also note the linux kernel HMAT latency display doesn't match up with the parameters passed in this patch series. Not sure which is at fault.
> 
> Test and results:
> 
> x86_64-softmmu/qemu-system-x86_64 -machine pc -nographic   -smp
> 2,sockets=2 -m 128M,slots=2,maxmem=1G
> -kernel /home/dan/repos/linux/vmlinux  -append "console=ttyS0"
> -object memory-backend-ram,size=64M,id=m0 -object
> memory-backend-ram,size=64M,id=m1geosync -numa node,nodeid=0,memdev=m0
> -numa node,nodeid=1,memdev=m1geosync,initiator=0 -numa
> cpu,node-id=0,socket-id=0 -numa cpu,node-id=0,socket-id=1 -numa
> hmat-lb,initiator=0,target=0,hierarchy=memory,data-type=access-latency,base-lat=1000,latency=5
> -numa
> hmat-lb,initiator=0,target=0,hierarchy=memory,data-type=access-bandwidth,base-bw=20,bandwidth=5
> -numa
> hmat-lb,initiator=0,target=1,hierarchy=memory,data-type=access-latency,base-lat=100,latency=10
> -numa
> hmat-lb,initiator=0,target=1,hierarchy=memory,data-type=access-bandwidth,base-bw=20,bandwidth=10
> 
> The Booting from ROM..[    0.000000] Linux version 5.3.0-rc2+
> (dan@volution) (gcc version 9.1.1 20190503 (Red Hat 9.1.1-1) (GCC)) #21
> SMP Tue Aug 6 17:15:49 AEST 2019
> 
> 
> [    0.419303] HMAT: Memory Flags:0001 Processor Domain:0 Memory Domain:0
> [    0.419648] HMAT: Memory Flags:0001 Processor Domain:0 Memory Domain:1
> [    0.419956] HMAT: Locality: Flags:00 Type:Access Latency Initiator Domains:1 Target Domains:2 Base:1000
> [    0.420527]   Initiator-Target[0-0]:5 nsec
> [    0.420791]   Initiator-Target[0-1]:10 nsec
> [    0.421068] HMAT: Locality: Flags:00 Type:Access Bandwidth Initiator Domains:1 Target Domains:2 Base:20
> [    0.421447]   Initiator-Target[0-0]:100 MB/s
> [    0.421635]   Initiator-Target[0-1]:200 MB/s
> 
> 
This is because the base-lat is only first set is valid, so if we input 
different base-lat like the test case, we follow the first 1000, so 
result is 5000 ps = 5 ns. but your suggestion is reasonable, this is not 
user-friendly. So we will refer to your suggestions. Thank you!
diff mbox series

Patch

diff --git a/hw/core/numa.c b/hw/core/numa.c
index cfb6339810..83ead77191 100644
--- a/hw/core/numa.c
+++ b/hw/core/numa.c
@@ -37,6 +37,7 @@ 
 #include "qemu/option.h"
 #include "qemu/config-file.h"
 #include "qemu/cutils.h"
+#include "hw/acpi/hmat.h"
 
 QemuOptsList qemu_numa_opts = {
     .name = "numa",
@@ -183,6 +184,126 @@  void parse_numa_distance(MachineState *ms, NumaDistOptions *dist, Error **errp)
     ms->numa_state->have_numa_distance = true;
 }
 
+void parse_numa_hmat_lb(MachineState *ms, NumaHmatLBOptions *node,
+                        Error **errp)
+{
+    int nb_numa_nodes = ms->numa_state->num_nodes;
+    NodeInfo *numa_info = ms->numa_state->nodes;
+    HMAT_LB_Info *hmat_lb = NULL;
+
+    if (node->data_type <= HMATLB_DATA_TYPE_WRITE_LATENCY) {
+        if (!node->has_latency) {
+            error_setg(errp, "Missing 'latency' option.");
+            return;
+        }
+        if (node->has_bandwidth) {
+            error_setg(errp, "Invalid option 'bandwidth' since "
+                       "the data type is latency.");
+            return;
+        }
+        if (node->has_base_bw) {
+            error_setg(errp, "Invalid option 'base_bw' since "
+                       "the data type is latency.");
+            return;
+        }
+    }
+
+    if (node->data_type >= HMATLB_DATA_TYPE_ACCESS_BANDWIDTH) {
+        if (!node->has_bandwidth) {
+            error_setg(errp, "Missing 'bandwidth' option.");
+            return;
+        }
+        if (node->has_latency) {
+            error_setg(errp, "Invalid option 'latency' since "
+                       "the data type is bandwidth.");
+            return;
+        }
+        if (node->has_base_lat) {
+            error_setg(errp, "Invalid option 'base_lat' since "
+                       "the data type is bandwidth.");
+            return;
+        }
+    }
+
+    if (node->initiator >= nb_numa_nodes) {
+        error_setg(errp, "Invalid initiator=%"
+                   PRIu16 ", it should be less than %d.",
+                   node->initiator, nb_numa_nodes);
+        return;
+    }
+    if (!numa_info[node->initiator].has_cpu) {
+        error_setg(errp, "Invalid initiator=%"
+                   PRIu16 ", it isn't an initiator proximity domain.",
+                   node->initiator);
+        return;
+    }
+
+    if (node->target >= nb_numa_nodes) {
+        error_setg(errp, "Invalid target=%"
+                   PRIu16 ", it should be less than %d.",
+                   node->target, nb_numa_nodes);
+        return;
+    }
+    if (!numa_info[node->target].initiator_valid) {
+        error_setg(errp, "Invalid target=%"
+                   PRIu16 ", it hasn't a valid initiator proximity domain.",
+                   node->target);
+        return;
+    }
+
+    if (node->has_latency) {
+        hmat_lb = ms->numa_state->hmat_lb[node->hierarchy][node->data_type];
+
+        if (!hmat_lb) {
+            hmat_lb = g_malloc0(sizeof(*hmat_lb));
+            ms->numa_state->hmat_lb[node->hierarchy][node->data_type] = hmat_lb;
+        } else if (hmat_lb->latency[node->initiator][node->target]) {
+            error_setg(errp, "Duplicate configuration of the latency for "
+                       "initiator=%" PRIu16 " and target=%" PRIu16 ".",
+                       node->initiator, node->target);
+            return;
+        }
+
+        /* Only the first time of setting the base unit is valid. */
+        if ((hmat_lb->base_lat == 0) && (node->has_base_lat)) {
+            hmat_lb->base_lat = node->base_lat;
+        }
+
+        hmat_lb->latency[node->initiator][node->target] = node->latency;
+    }
+
+    if (node->has_bandwidth) {
+        hmat_lb = ms->numa_state->hmat_lb[node->hierarchy][node->data_type];
+
+        if (!hmat_lb) {
+            hmat_lb = g_malloc0(sizeof(*hmat_lb));
+            ms->numa_state->hmat_lb[node->hierarchy][node->data_type] = hmat_lb;
+        } else if (hmat_lb->bandwidth[node->initiator][node->target]) {
+            error_setg(errp, "Duplicate configuration of the bandwidth for "
+                       "initiator=%" PRIu16 " and target=%" PRIu16 ".",
+                       node->initiator, node->target);
+            return;
+        }
+
+        /* Only the first time of setting the base unit is valid. */
+        if (hmat_lb->base_bw == 0) {
+            if (!node->has_base_bw) {
+                error_setg(errp, "Missing 'base-bw' option");
+                return;
+            } else {
+                hmat_lb->base_bw = node->base_bw;
+            }
+        }
+
+        hmat_lb->bandwidth[node->initiator][node->target] = node->bandwidth;
+    }
+
+    if (hmat_lb) {
+        hmat_lb->hierarchy = node->hierarchy;
+        hmat_lb->data_type = node->data_type;
+    }
+}
+
 void set_numa_options(MachineState *ms, NumaOptions *object, Error **errp)
 {
     Error *err = NULL;
@@ -221,6 +342,12 @@  void set_numa_options(MachineState *ms, NumaOptions *object, Error **errp)
         machine_set_cpu_numa_node(ms, qapi_NumaCpuOptions_base(&object->u.cpu),
                                   &err);
         break;
+    case NUMA_OPTIONS_TYPE_HMAT_LB:
+        parse_numa_hmat_lb(ms, &object->u.hmat_lb, &err);
+        if (err) {
+            goto end;
+        }
+        break;
     default:
         abort();
     }
diff --git a/include/sysemu/numa.h b/include/sysemu/numa.h
index 1ed3362917..f0857b7ee6 100644
--- a/include/sysemu/numa.h
+++ b/include/sysemu/numa.h
@@ -41,6 +41,8 @@  typedef struct NumaState NumaState;
 
 void set_numa_options(MachineState *ms, NumaOptions *object, Error **errp);
 void parse_numa_opts(MachineState *ms);
+void parse_numa_hmat_lb(MachineState *ms, NumaHmatLBOptions *node,
+                        Error **errp);
 void numa_complete_configuration(MachineState *ms);
 void query_numa_node_mem(NumaNodeMem node_mem[], MachineState *ms);
 extern QemuOptsList qemu_numa_opts;
diff --git a/qapi/machine.json b/qapi/machine.json
index 05e367d26a..25f5b0321d 100644
--- a/qapi/machine.json
+++ b/qapi/machine.json
@@ -377,10 +377,12 @@ 
 #
 # @cpu: property based CPU(s) to node mapping (Since: 2.10)
 #
+# @hmat-lb: memory latency and bandwidth information (Since: 4.2)
+#
 # Since: 2.1
 ##
 { 'enum': 'NumaOptionsType',
-  'data': [ 'node', 'dist', 'cpu' ] }
+  'data': [ 'node', 'dist', 'cpu', 'hmat-lb' ] }
 
 ##
 # @NumaOptions:
@@ -395,7 +397,8 @@ 
   'data': {
     'node': 'NumaNodeOptions',
     'dist': 'NumaDistOptions',
-    'cpu': 'NumaCpuOptions' }}
+    'cpu': 'NumaCpuOptions',
+    'hmat-lb': 'NumaHmatLBOptions' }}
 
 ##
 # @NumaNodeOptions:
@@ -504,6 +507,99 @@ 
    'base': 'CpuInstanceProperties',
    'data' : {} }
 
+##
+# @HmatLBMemoryHierarchy:
+#
+# The memory hierarchy in the System Locality Latency
+# and Bandwidth Information Structure of HMAT (Heterogeneous
+# Memory Attribute Table)
+#
+# For more information of @HmatLBMemoryHierarchy see
+# the chapter 5.2.27.4: Table 5-142: Field "Flags" of ACPI 6.3 spec.
+#
+# @memory: the structure represents the memory performance
+#
+# @first-level: first level memory of memory side cached memory
+#
+# @second-level: second level memory of memory side cached memory
+#
+# @third-level: third level memory of memory side cached memory
+#
+# Since: 4.2
+##
+{ 'enum': 'HmatLBMemoryHierarchy',
+  'data': [ 'memory', 'first-level', 'second-level', 'third-level' ] }
+
+##
+# @HmatLBDataType:
+#
+# Data type in the System Locality Latency
+# and Bandwidth Information Structure of HMAT (Heterogeneous
+# Memory Attribute Table)
+#
+# For more information of @HmatLBDataType see
+# the chapter 5.2.27.4: Table 5-142:  Field "Data Type" of ACPI 6.3 spec.
+#
+# @access-latency: access latency (picoseconds)
+#
+# @read-latency: read latency (picoseconds)
+#
+# @write-latency: write latency (picoseconds)
+#
+# @access-bandwidth: access bandwidth (MB/s)
+#
+# @read-bandwidth: read bandwidth (MB/s)
+#
+# @write-bandwidth: write bandwidth (MB/s)
+#
+# Since: 4.2
+##
+{ 'enum': 'HmatLBDataType',
+  'data': [ 'access-latency', 'read-latency', 'write-latency',
+            'access-bandwidth', 'read-bandwidth', 'write-bandwidth' ] }
+
+##
+# @NumaHmatLBOptions:
+#
+# Set the system locality latency and bandwidth information
+# between Initiator and Target proximity Domains.
+#
+# For more information of @NumaHmatLBOptions see
+# the chapter 5.2.27.4: Table 5-142 of ACPI 6.3 spec.
+#
+# @initiator: the Initiator Proximity Domain.
+#
+# @target: the Target Proximity Domain.
+#
+# @hierarchy: the Memory Hierarchy. Indicates the performance
+#             of memory or side cache.
+#
+# @data-type: presents the type of data, access/read/write
+#             latency or hit latency.
+#
+# @base-lat: the base unit for latency in picoseconds.
+#
+# @base-bw: the base unit for bandwidth in megabytes per second(MB/s).
+#
+# @latency: the value of latency based on Base Unit from @initiator
+#           to @target proximity domain.
+#
+# @bandwidth: the value of bandwidth based on Base Unit between
+#             @initiator and @target proximity domain.
+#
+# Since: 4.2
+##
+{ 'struct': 'NumaHmatLBOptions',
+    'data': {
+    'initiator': 'uint16',
+    'target': 'uint16',
+    'hierarchy': 'HmatLBMemoryHierarchy',
+    'data-type': 'HmatLBDataType',
+    '*base-lat': 'uint64',
+    '*base-bw': 'uint64',
+    '*latency': 'uint16',
+    '*bandwidth': 'uint16' }}
+
 ##
 # @HostMemPolicy:
 #
diff --git a/qemu-options.hx b/qemu-options.hx
index c480781992..609ea40086 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -164,16 +164,19 @@  DEF("numa", HAS_ARG, QEMU_OPTION_numa,
     "-numa node[,mem=size][,cpus=firstcpu[-lastcpu]][,nodeid=node][,initiator=node]\n"
     "-numa node[,memdev=id][,cpus=firstcpu[-lastcpu]][,nodeid=node][,initiator=node]\n"
     "-numa dist,src=source,dst=destination,val=distance\n"
-    "-numa cpu,node-id=node[,socket-id=x][,core-id=y][,thread-id=z]\n",
+    "-numa cpu,node-id=node[,socket-id=x][,core-id=y][,thread-id=z]\n"
+    "-numa hmat-lb,initiator=node,target=node,hierarchy=memory|first-level|second-level|third-level,data-type=access-latency|read-latency|write-latency[,base-lat=blat][,base-bw=bbw][,latency=lat][,bandwidth=bw]\n",
     QEMU_ARCH_ALL)
 STEXI
 @item -numa node[,mem=@var{size}][,cpus=@var{firstcpu}[-@var{lastcpu}]][,nodeid=@var{node}][,initiator=@var{initiator}]
 @itemx -numa node[,memdev=@var{id}][,cpus=@var{firstcpu}[-@var{lastcpu}]][,nodeid=@var{node}][,initiator=@var{initiator}]
 @itemx -numa dist,src=@var{source},dst=@var{destination},val=@var{distance}
 @itemx -numa cpu,node-id=@var{node}[,socket-id=@var{x}][,core-id=@var{y}][,thread-id=@var{z}]
+@itemx -numa hmat-lb,initiator=@var{node},target=@var{node},hierarchy=@var{str},data-type=@var{str}[,base-lat=@var{blat}][,base-bw=@var{bbw}][,latency=@var{lat}][,bandwidth=@var{bw}]
 @findex -numa
 Define a NUMA node and assign RAM and VCPUs to it.
 Set the NUMA distance from a source node to a destination node.
+Set the ACPI Heterogeneous Memory Attributes for the given nodes.
 
 Legacy VCPU assignment uses @samp{cpus} option where
 @var{firstcpu} and @var{lastcpu} are CPU indexes. Each
@@ -250,6 +253,46 @@  specified resources, it just assigns existing resources to NUMA
 nodes. This means that one still has to use the @option{-m},
 @option{-smp} options to allocate RAM and VCPUs respectively.
 
+Use @samp{hmat-lb} to set System Locality Latency and Bandwidth Information
+between initiator and target NUMA nodes in ACPI Heterogeneous Attribute Memory Table (HMAT).
+Initiator NUMA node can create memory requests, usually including one or more processors.
+Target NUMA node contains addressable memory.
+[,base-lat=@var{blat}][,base-bw=@var{bbw}][,latency=@var{lat}][,bandwidth=@var{bw}]
+
+In @samp{hmat-lb} option, @var{node} are NUMA node IDs. @var{str} of 'hierarchy'
+is the memory hierarchy of the target NUMA node: if @var{str} is 'memory', the structure
+represents the memory performance; if @var{str} is 'first-level|second-level|third-level',
+this structure represents aggregated performance of memory side caches for each domain.
+@var{str} of 'data-type' is type of data represented by this structure instance:
+if 'hierarchy' is 'memory', 'data-type' is 'access|read|write' latency(picoseconds)
+or 'access|read|write' bandwidth(MB/s) of the target memory; if 'hierarchy' is
+'first-level|second-level|third-level', 'data-type' is 'access|read|write' hit latency(picoseconds)
+or 'access|read|write' hit bandwidth of the target memory side cache. @var{blat}
+or @var{bbw} is Matrix Entry Values(latency or bandwidth) base unit used for normalizing
+the matrix entry values(which store the latency or bandwidth values). Base unit
+for latency in picoseconds. Base unit for bandwidth in megabytes per second(MB/s).
+Note: Due to the minimum matrix value entry value being 10, the base unit corresponds
+to a value of 10. And @var{blat} or @var{bbw} should be an integer. @var{lat} or
+@var{bw} is the latency/bandwidth value.
+
+For example, the following option assigns NUMA node 0 and 1. Node 0 has 2 cpus and
+a ram, node 1 has only a ram. The processors in node 0 access memory in node
+0 with access-latency 5 picoseconds(base latency is 10), access-bandwidth 5 MB/s(base latency is 20);
+The processors in NUMA node 0 access memory in NUMA node 1 with access-latency 10
+picoseconds(base latency is 10), access-bandwidth 10 MB/s(base latency is 20).
+@example
+-m 2G \
+-object memory-backend-ram,size=1024M,policy=bind,host-nodes=0,id=ram-node0 -numa node,nodeid=0,memdev=ram-node0 \
+-object memory-backend-ram,size=1024M,policy=bind,host-nodes=1,id=ram-node1 -numa node,nodeid=1,memdev=ram-node1 \
+-smp 2 \
+-numa cpu,node-id=0,socket-id=0 \
+-numa cpu,node-id=0,socket-id=1 \
+-numa hmat-lb,initiator=0,target=0,hierarchy=memory,data-type=access-latency,base-lat=10,latency=5 \
+-numa hmat-lb,initiator=0,target=0,hierarchy=memory,data-type=access-bandwidth,base-bw=20,bandwidth=5 \
+-numa hmat-lb,initiator=0,target=1,hierarchy=memory,data-type=access-latency,base-lat=10,latency=10 \
+-numa hmat-lb,initiator=0,target=1,hierarchy=memory,data-type=access-bandwidth,base-bw=20,bandwidth=10 \
+@end example
+
 ETEXI
 
 DEF("add-fd", HAS_ARG, QEMU_OPTION_add_fd,