diff mbox series

[3/3] numa: Initialize node initiator with respect to .has_cpu

Message ID d3039c47e4ce1118bc7c4d4f51da6412c6669339.1590753455.git.mprivozn@redhat.com
State New
Headers show
Series Couple of HMAT fixes | expand

Commit Message

Michal Prívozník May 29, 2020, 1:33 p.m. UTC
The initiator attribute of a NUMA node is documented as the 'NUMA
node that has best performance to given NUMA node'. If a NUMA
node has at least one CPU there can hardly be a different node
with better performace and thus all NUMA nodes which have a CPU
are initiators to themselves. Reflect this fact when initializing
the attribute.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
---
 hw/core/numa.c | 14 +++++++++-----
 1 file changed, 9 insertions(+), 5 deletions(-)

Comments

Igor Mammedov May 29, 2020, 3:09 p.m. UTC | #1
On Fri, 29 May 2020 15:33:48 +0200
Michal Privoznik <mprivozn@redhat.com> wrote:

> The initiator attribute of a NUMA node is documented as the 'NUMA
> node that has best performance to given NUMA node'. If a NUMA
> node has at least one CPU there can hardly be a different node
> with better performace and thus all NUMA nodes which have a CPU
> are initiators to themselves. Reflect this fact when initializing
> the attribute.

It is not true in case of the node is memory-less

> Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
> ---
>  hw/core/numa.c | 14 +++++++++-----
>  1 file changed, 9 insertions(+), 5 deletions(-)
> 
> diff --git a/hw/core/numa.c b/hw/core/numa.c
> index 338453461c..1c9bc761cc 100644
> --- a/hw/core/numa.c
> +++ b/hw/core/numa.c
> @@ -136,11 +136,15 @@ static void parse_numa_node(MachineState *ms, NumaNodeOptions *node,
>          numa_info[nodenr].node_memdev = MEMORY_BACKEND(o);
>      }
>  
> -    /*
> -     * If not set the initiator, set it to MAX_NODES. And if
> -     * HMAT is enabled and this node has no cpus, QEMU will raise error.
> -     */
> -    numa_info[nodenr].initiator = MAX_NODES;
> +    /* Initialize initiator to either the current NUMA node (if
> +     * it has at least one CPU), or to MAX_NODES. If HMAT is
> +     * enabled an error will be raised later in
> +     * numa_validate_initiator(). */
> +    if (numa_info[nodenr].has_cpu)
> +        numa_info[nodenr].initiator = nodenr;
> +    else
> +        numa_info[nodenr].initiator = MAX_NODES;
> +
>      if (node->has_initiator) {
>          if (!ms->numa_state->hmat_enabled) {
>              error_setg(errp, "ACPI Heterogeneous Memory Attribute Table "
Michal Prívozník May 29, 2020, 3:24 p.m. UTC | #2
On 5/29/20 5:09 PM, Igor Mammedov wrote:
> On Fri, 29 May 2020 15:33:48 +0200
> Michal Privoznik <mprivozn@redhat.com> wrote:
> 
>> The initiator attribute of a NUMA node is documented as the 'NUMA
>> node that has best performance to given NUMA node'. If a NUMA
>> node has at least one CPU there can hardly be a different node
>> with better performace and thus all NUMA nodes which have a CPU
>> are initiators to themselves. Reflect this fact when initializing
>> the attribute.
> 
> It is not true in case of the node is memory-less

Ah, so the node has CPUs only then? Okay, right now my libvirt patches 
don't allow that, but formatting initator for all NUMA nodes should be 
trivial.

Michal
Michal Prívozník June 1, 2020, 8:10 a.m. UTC | #3
On 5/29/20 5:09 PM, Igor Mammedov wrote:
> On Fri, 29 May 2020 15:33:48 +0200
> Michal Privoznik <mprivozn@redhat.com> wrote:
> 
>> The initiator attribute of a NUMA node is documented as the 'NUMA
>> node that has best performance to given NUMA node'. If a NUMA
>> node has at least one CPU there can hardly be a different node
>> with better performace and thus all NUMA nodes which have a CPU
>> are initiators to themselves. Reflect this fact when initializing
>> the attribute.
> 
> It is not true in case of the node is memory-less

Are you saying that if there's a memory-less NUMA node, then it needs to 
have initiator set too? Asking mostly out of curiosity because we don't 
allow memory-less NUMA nodes in Libvirt just yet. Nor cpu-less, but my 
patches that I'm referring to in cover letter will allow at least 
cpu-less nodes. Should I allow both?

Also, can you shed more light into why machine_set_cpu_numa_node() did 
not override the .initiator?

Thanks,
Michal
Tao Xu June 2, 2020, 8 a.m. UTC | #4
On 6/1/2020 4:10 PM, Michal Privoznik wrote:
> On 5/29/20 5:09 PM, Igor Mammedov wrote:
>> On Fri, 29 May 2020 15:33:48 +0200
>> Michal Privoznik <mprivozn@redhat.com> wrote:
>>
>>> The initiator attribute of a NUMA node is documented as the 'NUMA
>>> node that has best performance to given NUMA node'. If a NUMA
>>> node has at least one CPU there can hardly be a different node
>>> with better performace and thus all NUMA nodes which have a CPU
>>> are initiators to themselves. Reflect this fact when initializing
>>> the attribute.
>>
>> It is not true in case of the node is memory-less
> 
> Are you saying that if there's a memory-less NUMA node, then it needs to
> have initiator set too? Asking mostly out of curiosity because we don't
> allow memory-less NUMA nodes in Libvirt just yet. Nor cpu-less, but my
> patches that I'm referring to in cover letter will allow at least
> cpu-less nodes. Should I allow both?
QEMU now is not support memory-less NUMA node, but in hardware may be 
supported. So we reserve this type of NUMA node for future usage. And 
QEMU now can support cpu-less NUMA node, for emulating some "slow" 
memory(like some NVDIMM).

> 
> Also, can you shed more light into why machine_set_cpu_numa_node() did
> not override the .initiator?
> 
> Thanks,
> Michal
>
Michal Prívozník June 3, 2020, 9:16 a.m. UTC | #5
On 6/2/20 10:00 AM, Tao Xu wrote:
> 
> On 6/1/2020 4:10 PM, Michal Privoznik wrote:
>> On 5/29/20 5:09 PM, Igor Mammedov wrote:
>>> On Fri, 29 May 2020 15:33:48 +0200
>>> Michal Privoznik <mprivozn@redhat.com> wrote:
>>>
>>>> The initiator attribute of a NUMA node is documented as the 'NUMA
>>>> node that has best performance to given NUMA node'. If a NUMA
>>>> node has at least one CPU there can hardly be a different node
>>>> with better performace and thus all NUMA nodes which have a CPU
>>>> are initiators to themselves. Reflect this fact when initializing
>>>> the attribute.
>>>
>>> It is not true in case of the node is memory-less
>>
>> Are you saying that if there's a memory-less NUMA node, then it needs to
>> have initiator set too? Asking mostly out of curiosity because we don't
>> allow memory-less NUMA nodes in Libvirt just yet. Nor cpu-less, but my
>> patches that I'm referring to in cover letter will allow at least
>> cpu-less nodes. Should I allow both?
> QEMU now is not support memory-less NUMA node, but in hardware may be 
> supported. So we reserve this type of NUMA node for future usage. And 
> QEMU now can support cpu-less NUMA node, for emulating some "slow" 
> memory(like some NVDIMM).

Oh yeah, I understand that. But it doesn't explain why initiator needs 
to be specified for NUMA nodes with cpus and memory, or does it? Maybe 
I'm still misunderstanding what the initiator is.

> 
>>
>> Also, can you shed more light into why machine_set_cpu_numa_node() did
>> not override the .initiator?

And this one is still unanswered too. Because from user's perspective, 
initiator has to be set on all NUMA nodes (if HMAT is enabled) and it 
seems like this auto assignment code is not run/not working.

Michal
Tao Xu June 5, 2020, 1:52 a.m. UTC | #6
On 6/3/20 5:16 PM, Michal Privoznik wrote:
> On 6/2/20 10:00 AM, Tao Xu wrote:
>>
>> On 6/1/2020 4:10 PM, Michal Privoznik wrote:
>>> On 5/29/20 5:09 PM, Igor Mammedov wrote:
>>>> On Fri, 29 May 2020 15:33:48 +0200
>>>> Michal Privoznik <mprivozn@redhat.com> wrote:
>>>>
>>>>> The initiator attribute of a NUMA node is documented as the 'NUMA
>>>>> node that has best performance to given NUMA node'. If a NUMA
>>>>> node has at least one CPU there can hardly be a different node
>>>>> with better performace and thus all NUMA nodes which have a CPU
>>>>> are initiators to themselves. Reflect this fact when initializing
>>>>> the attribute.
>>>>
>>>> It is not true in case of the node is memory-less
>>>
>>> Are you saying that if there's a memory-less NUMA node, then it needs to
>>> have initiator set too? Asking mostly out of curiosity because we don't
>>> allow memory-less NUMA nodes in Libvirt just yet. Nor cpu-less, but my
>>> patches that I'm referring to in cover letter will allow at least
>>> cpu-less nodes. Should I allow both?
>> QEMU now is not support memory-less NUMA node, but in hardware may be
>> supported. So we reserve this type of NUMA node for future usage. And
>> QEMU now can support cpu-less NUMA node, for emulating some "slow"
>> memory(like some NVDIMM).
> 
> Oh yeah, I understand that. But it doesn't explain why initiator needs
> to be specified for NUMA nodes with cpus and memory, or does it? Maybe
> I'm still misunderstanding what the initiator is.
> 

Yes, the initiator NUMA nodes with cpus and memory should be itself. In 
ACPI 6.3 spec, initiator is defined as:

This field is valid only if the memory controller
responsible for satisfying the access to memory
belonging to the specified memory proximity
domain is directly attached to an initiator that
belongs to a proximity domain. In that case, this
field contains the integer that represents the
proximity domain to which the initiator (Generic
Initiator or Processor) belongs. This number shall
match the corresponding entry in the SRAT table’s
processor affinity structure (e.g., Processor Local
APIC/SAPIC Affinity Structure, Processor Local
x2APIC Affinity Structure, GICC Affinity Structure) if
the initiator is a processor, or the Generic Initiator
Affinity Structure if the initator is a generic
initiator.
Note: this field provides additional information as
to the initiator node that is closest (as in directly
attached) to the memory address ranges within
the specified memory proximity domain, and
therefore should provide the best performance.

And if in the future, there is a memory-less NUMA node. Because in HMAT 
we describe "Memory" Proximity Domain Attributes Structure, I think we 
should not add memory-less NUMA node into HMAT.

>>
>>>
>>> Also, can you shed more light into why machine_set_cpu_numa_node() did
>>> not override the .initiator?
> 
> And this one is still unanswered too. Because from user's perspective,
> initiator has to be set on all NUMA nodes (if HMAT is enabled) and it
> seems like this auto assignment code is not run/not working.
> 
> Michal
> 

So we check the HMAT configure in hw/core/machine.c 
numa_validate_initiator(NumaState *numa_state) because the initiator 
NUMA nodes with cpus and memory should be itself. And in 
machine_set_cpu_numa_node we didn't use auto assignment way just use 
user's setting in cli (although there is only one right choice for NUMA 
nodes with cpus and memory). But I don't know if it is appropriate to 
auto assign the initiator for NUMA nodes with cpus and memory.
Michal Prívozník June 11, 2020, 2:42 p.m. UTC | #7
On 6/5/20 3:52 AM, Tao Xu wrote:
> On 6/3/20 5:16 PM, Michal Privoznik wrote:
>> On 6/2/20 10:00 AM, Tao Xu wrote:
>>>
>>> On 6/1/2020 4:10 PM, Michal Privoznik wrote:
>>>> On 5/29/20 5:09 PM, Igor Mammedov wrote:
>>>>> On Fri, 29 May 2020 15:33:48 +0200
>>>>> Michal Privoznik <mprivozn@redhat.com> wrote:
>>>>>
>>>>>> The initiator attribute of a NUMA node is documented as the 'NUMA
>>>>>> node that has best performance to given NUMA node'. If a NUMA
>>>>>> node has at least one CPU there can hardly be a different node
>>>>>> with better performace and thus all NUMA nodes which have a CPU
>>>>>> are initiators to themselves. Reflect this fact when initializing
>>>>>> the attribute.
>>>>>
>>>>> It is not true in case of the node is memory-less
>>>>
>>>> Are you saying that if there's a memory-less NUMA node, then it 
>>>> needs to
>>>> have initiator set too? Asking mostly out of curiosity because we don't
>>>> allow memory-less NUMA nodes in Libvirt just yet. Nor cpu-less, but my
>>>> patches that I'm referring to in cover letter will allow at least
>>>> cpu-less nodes. Should I allow both?
>>> QEMU now is not support memory-less NUMA node, but in hardware may be
>>> supported. So we reserve this type of NUMA node for future usage. And
>>> QEMU now can support cpu-less NUMA node, for emulating some "slow"
>>> memory(like some NVDIMM).
>>
>> Oh yeah, I understand that. But it doesn't explain why initiator needs
>> to be specified for NUMA nodes with cpus and memory, or does it? Maybe
>> I'm still misunderstanding what the initiator is.
>>
> 
> Yes, the initiator NUMA nodes with cpus and memory should be itself. In 
> ACPI 6.3 spec, initiator is defined as:
> 
> This field is valid only if the memory controller
> responsible for satisfying the access to memory
> belonging to the specified memory proximity
> domain is directly attached to an initiator that
> belongs to a proximity domain. In that case, this
> field contains the integer that represents the
> proximity domain to which the initiator (Generic
> Initiator or Processor) belongs. This number shall
> match the corresponding entry in the SRAT table’s
> processor affinity structure (e.g., Processor Local
> APIC/SAPIC Affinity Structure, Processor Local
> x2APIC Affinity Structure, GICC Affinity Structure) if
> the initiator is a processor, or the Generic Initiator
> Affinity Structure if the initator is a generic
> initiator.
> Note: this field provides additional information as
> to the initiator node that is closest (as in directly
> attached) to the memory address ranges within
> the specified memory proximity domain, and
> therefore should provide the best performance.
> 
> And if in the future, there is a memory-less NUMA node. Because in HMAT 
> we describe "Memory" Proximity Domain Attributes Structure, I think we 
> should not add memory-less NUMA node into HMAT.

Then I guess something else must be broken. Because as reported here [1] 
if I configure two numa nodes, both with two vCPUs and set initiators of 
each node to itselfs I get the following error message:

qemu-system-x86_64: -numa 
node,nodeid=1,cpus=2-3,initiator=1,memdev=ram-node1: The initiator of 
CPU NUMA node 1 should be itself

Funny about this error message is how contradictory it is. The cmd line 
showed in the error shows the initiator of the node 1 is indeed node 1.

1: https://lists.nongnu.org/archive/html/qemu-devel/2020-06/msg00071.html

Michal
diff mbox series

Patch

diff --git a/hw/core/numa.c b/hw/core/numa.c
index 338453461c..1c9bc761cc 100644
--- a/hw/core/numa.c
+++ b/hw/core/numa.c
@@ -136,11 +136,15 @@  static void parse_numa_node(MachineState *ms, NumaNodeOptions *node,
         numa_info[nodenr].node_memdev = MEMORY_BACKEND(o);
     }
 
-    /*
-     * If not set the initiator, set it to MAX_NODES. And if
-     * HMAT is enabled and this node has no cpus, QEMU will raise error.
-     */
-    numa_info[nodenr].initiator = MAX_NODES;
+    /* Initialize initiator to either the current NUMA node (if
+     * it has at least one CPU), or to MAX_NODES. If HMAT is
+     * enabled an error will be raised later in
+     * numa_validate_initiator(). */
+    if (numa_info[nodenr].has_cpu)
+        numa_info[nodenr].initiator = nodenr;
+    else
+        numa_info[nodenr].initiator = MAX_NODES;
+
     if (node->has_initiator) {
         if (!ms->numa_state->hmat_enabled) {
             error_setg(errp, "ACPI Heterogeneous Memory Attribute Table "