diff mbox series

[v4] hw/arm/virt: Don't create device-tree node for empty NUMA node

Message ID 20211015104909.16722-1-gshan@redhat.com
State New
Headers show
Series [v4] hw/arm/virt: Don't create device-tree node for empty NUMA node | expand

Commit Message

Gavin Shan Oct. 15, 2021, 10:49 a.m. UTC
The empty NUMA node, where no memory resides, are allowed. For
example, the following command line specifies two empty NUMA nodes.
With this, QEMU fails to boot because of the conflicting device-tree
node names, as the following error message indicates.

  /home/gavin/sandbox/qemu.main/build/qemu-system-aarch64 \
  -accel kvm -machine virt,gic-version=host               \
  -cpu host -smp 4,sockets=2,cores=2,threads=1            \
  -m 1024M,slots=16,maxmem=64G                            \
  -object memory-backend-ram,id=mem0,size=512M            \
  -object memory-backend-ram,id=mem1,size=512M            \
  -numa node,nodeid=0,cpus=0-1,memdev=mem0                \
  -numa node,nodeid=1,cpus=2-3,memdev=mem1                \
  -numa node,nodeid=2                                     \
  -numa node,nodeid=3
    :
  qemu-system-aarch64: FDT: Failed to create subnode /memory@80000000: FDT_ERR_EXISTS

As specified by linux device-tree binding document, the device-tree
nodes for these empty NUMA nodes shouldn't be generated. However,
the corresponding NUMA node IDs should be included in the distance
map. As the memory hotplug through device-tree on ARM64 isn't existing
so far, it's pointless to expose the empty NUMA nodes through device-tree.
So this simply skips populating the device-tree nodes for these empty
NUMA nodes to avoid the error, so that QEMU can be started successfully.

Signed-off-by: Gavin Shan <gshan@redhat.com>
---
v4: Drop patch to enforce distance-map as memory hotplug through
    device-tree is never supported on ARM64. It's pointless to
    expose these empty NUMA nodes. Besides, comments added to
    explain the code changes included in this patch as Drew
    suggested.
---
 hw/arm/boot.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

Comments

Andrew Jones Oct. 15, 2021, 12:22 p.m. UTC | #1
On Fri, Oct 15, 2021 at 06:49:09PM +0800, Gavin Shan wrote:
> The empty NUMA node, where no memory resides, are allowed. For
> example, the following command line specifies two empty NUMA nodes.
> With this, QEMU fails to boot because of the conflicting device-tree
> node names, as the following error message indicates.
> 
>   /home/gavin/sandbox/qemu.main/build/qemu-system-aarch64 \
>   -accel kvm -machine virt,gic-version=host               \
>   -cpu host -smp 4,sockets=2,cores=2,threads=1            \
>   -m 1024M,slots=16,maxmem=64G                            \
>   -object memory-backend-ram,id=mem0,size=512M            \
>   -object memory-backend-ram,id=mem1,size=512M            \
>   -numa node,nodeid=0,cpus=0-1,memdev=mem0                \
>   -numa node,nodeid=1,cpus=2-3,memdev=mem1                \
>   -numa node,nodeid=2                                     \
>   -numa node,nodeid=3
>     :
>   qemu-system-aarch64: FDT: Failed to create subnode /memory@80000000: FDT_ERR_EXISTS
> 
> As specified by linux device-tree binding document, the device-tree
> nodes for these empty NUMA nodes shouldn't be generated. However,
> the corresponding NUMA node IDs should be included in the distance
> map. As the memory hotplug through device-tree on ARM64 isn't existing
> so far, it's pointless to expose the empty NUMA nodes through device-tree.

Instead of "it's pointless to expose the empty NUMA nodes through
device-tree", how about

 it's not necessary to require the user to provide a distance map.
 Furthermore, the default distance map Linux generates may even be
 sufficient.

> So this simply skips populating the device-tree nodes for these empty
> NUMA nodes to avoid the error, so that QEMU can be started successfully.
> 
> Signed-off-by: Gavin Shan <gshan@redhat.com>
> ---
> v4: Drop patch to enforce distance-map as memory hotplug through
>     device-tree is never supported on ARM64. It's pointless to
>     expose these empty NUMA nodes. Besides, comments added to
>     explain the code changes included in this patch as Drew
>     suggested.
> ---
>  hw/arm/boot.c | 14 ++++++++++++++
>  1 file changed, 14 insertions(+)
> 
> diff --git a/hw/arm/boot.c b/hw/arm/boot.c
> index 57efb61ee4..e05c1c149c 100644
> --- a/hw/arm/boot.c
> +++ b/hw/arm/boot.c
> @@ -599,10 +599,24 @@ int arm_load_dtb(hwaddr addr, const struct arm_boot_info *binfo,
>      }
>      g_strfreev(node_path);
>  
> +    /*
> +     * According to Linux NUMA binding document, the device tree nodes
> +     * for the empty NUMA nodes shouldn't be generated, but their NUMA
> +     * node IDs should be included in the distance map instead. However,
> +     * it's pointless to expose the empty NUMA nodes as memory hotplug
> +     * through device tree is never supported. We simply skip generating
> +     * their device tree nodes to avoid the unexpected device tree
> +     * generating failure due to the duplicated names of these empty
> +     * NUMA nodes.
> +     */

    /*
     * We drop all the memory nodes which correspond to empty NUMA nodes from
     * the device tree, because the Linux NUMA binding document states they
     * should not be generated.  Linux will get the NUMA node IDs of the empty
     * NUMA nodes from the distance map if they are needed.  This means QEMU
     * users may be obliged to provide command lines which configure distance
     * maps when the empty NUMA node IDs are needed and Linux's default
     * distance map isn't sufficient.
     */



>      if (ms->numa_state != NULL && ms->numa_state->num_nodes > 0) {
>          mem_base = binfo->loader_start;
>          for (i = 0; i < ms->numa_state->num_nodes; i++) {
>              mem_len = ms->numa_state->nodes[i].node_mem;
> +            if (!mem_len) {
> +                continue;
> +            }
> +
>              rc = fdt_add_memory_node(fdt, acells, mem_base,
>                                       scells, mem_len, i);
>              if (rc < 0) {
> -- 
> 2.23.0
>

Thanks,
drew
Gavin Shan Oct. 15, 2021, 12:43 p.m. UTC | #2
On 10/15/21 11:22 PM, Andrew Jones wrote:
> On Fri, Oct 15, 2021 at 06:49:09PM +0800, Gavin Shan wrote:
>> The empty NUMA node, where no memory resides, are allowed. For
>> example, the following command line specifies two empty NUMA nodes.
>> With this, QEMU fails to boot because of the conflicting device-tree
>> node names, as the following error message indicates.
>>
>>    /home/gavin/sandbox/qemu.main/build/qemu-system-aarch64 \
>>    -accel kvm -machine virt,gic-version=host               \
>>    -cpu host -smp 4,sockets=2,cores=2,threads=1            \
>>    -m 1024M,slots=16,maxmem=64G                            \
>>    -object memory-backend-ram,id=mem0,size=512M            \
>>    -object memory-backend-ram,id=mem1,size=512M            \
>>    -numa node,nodeid=0,cpus=0-1,memdev=mem0                \
>>    -numa node,nodeid=1,cpus=2-3,memdev=mem1                \
>>    -numa node,nodeid=2                                     \
>>    -numa node,nodeid=3
>>      :
>>    qemu-system-aarch64: FDT: Failed to create subnode /memory@80000000: FDT_ERR_EXISTS
>>
>> As specified by linux device-tree binding document, the device-tree
>> nodes for these empty NUMA nodes shouldn't be generated. However,
>> the corresponding NUMA node IDs should be included in the distance
>> map. As the memory hotplug through device-tree on ARM64 isn't existing
>> so far, it's pointless to expose the empty NUMA nodes through device-tree.
> 
> Instead of "it's pointless to expose the empty NUMA nodes through
> device-tree", how about
> 
>   it's not necessary to require the user to provide a distance map.
>   Furthermore, the default distance map Linux generates may even be
>   sufficient.
> 

Yes, much better.

>> So this simply skips populating the device-tree nodes for these empty
>> NUMA nodes to avoid the error, so that QEMU can be started successfully.
>>
>> Signed-off-by: Gavin Shan <gshan@redhat.com>
>> ---
>> v4: Drop patch to enforce distance-map as memory hotplug through
>>      device-tree is never supported on ARM64. It's pointless to
>>      expose these empty NUMA nodes. Besides, comments added to
>>      explain the code changes included in this patch as Drew
>>      suggested.
>> ---
>>   hw/arm/boot.c | 14 ++++++++++++++
>>   1 file changed, 14 insertions(+)
>>
>> diff --git a/hw/arm/boot.c b/hw/arm/boot.c
>> index 57efb61ee4..e05c1c149c 100644
>> --- a/hw/arm/boot.c
>> +++ b/hw/arm/boot.c
>> @@ -599,10 +599,24 @@ int arm_load_dtb(hwaddr addr, const struct arm_boot_info *binfo,
>>       }
>>       g_strfreev(node_path);
>>   
>> +    /*
>> +     * According to Linux NUMA binding document, the device tree nodes
>> +     * for the empty NUMA nodes shouldn't be generated, but their NUMA
>> +     * node IDs should be included in the distance map instead. However,
>> +     * it's pointless to expose the empty NUMA nodes as memory hotplug
>> +     * through device tree is never supported. We simply skip generating
>> +     * their device tree nodes to avoid the unexpected device tree
>> +     * generating failure due to the duplicated names of these empty
>> +     * NUMA nodes.
>> +     */
> 
>      /*
>       * We drop all the memory nodes which correspond to empty NUMA nodes from
>       * the device tree, because the Linux NUMA binding document states they
>       * should not be generated.  Linux will get the NUMA node IDs of the empty
>       * NUMA nodes from the distance map if they are needed.  This means QEMU
>       * users may be obliged to provide command lines which configure distance
>       * maps when the empty NUMA node IDs are needed and Linux's default
>       * distance map isn't sufficient.
>       */
> 

Thanks, Drew. Copy-and-posted to v5 :)

> 
> 
>>       if (ms->numa_state != NULL && ms->numa_state->num_nodes > 0) {
>>           mem_base = binfo->loader_start;
>>           for (i = 0; i < ms->numa_state->num_nodes; i++) {
>>               mem_len = ms->numa_state->nodes[i].node_mem;
>> +            if (!mem_len) {
>> +                continue;
>> +            }
>> +
>>               rc = fdt_add_memory_node(fdt, acells, mem_base,
>>                                        scells, mem_len, i);
>>               if (rc < 0) {

Thanks,
Gavin
diff mbox series

Patch

diff --git a/hw/arm/boot.c b/hw/arm/boot.c
index 57efb61ee4..e05c1c149c 100644
--- a/hw/arm/boot.c
+++ b/hw/arm/boot.c
@@ -599,10 +599,24 @@  int arm_load_dtb(hwaddr addr, const struct arm_boot_info *binfo,
     }
     g_strfreev(node_path);
 
+    /*
+     * According to Linux NUMA binding document, the device tree nodes
+     * for the empty NUMA nodes shouldn't be generated, but their NUMA
+     * node IDs should be included in the distance map instead. However,
+     * it's pointless to expose the empty NUMA nodes as memory hotplug
+     * through device tree is never supported. We simply skip generating
+     * their device tree nodes to avoid the unexpected device tree
+     * generating failure due to the duplicated names of these empty
+     * NUMA nodes.
+     */
     if (ms->numa_state != NULL && ms->numa_state->num_nodes > 0) {
         mem_base = binfo->loader_start;
         for (i = 0; i < ms->numa_state->num_nodes; i++) {
             mem_len = ms->numa_state->nodes[i].node_mem;
+            if (!mem_len) {
+                continue;
+            }
+
             rc = fdt_add_memory_node(fdt, acells, mem_base,
                                      scells, mem_len, i);
             if (rc < 0) {