diff mbox series

[RFC,v3,1/4] vl.c: Add -smp, clusters=* command line support for ARM cpu

Message ID 20210516103228.37792-2-wangyanan55@huawei.com
State New
Headers show
Series hw/arm/virt: Introduce cluster cpu topology support | expand

Commit Message

wangyanan (Y) May 16, 2021, 10:32 a.m. UTC
In implementations of ARM architecture, at most there could be a
cpu hierarchy like "sockets/dies/clusters/cores/threads" defined.
For example, ARM64 server chip Kunpeng 920 totally has 2 sockets,
2 NUMA nodes (also means cpu dies) in each socket, 6 clusters in
each NUMA node, 4 cores in each cluster, and doesn't support SMT.
Clusters within the same NUMA share a L3 cache and cores within
the same cluster share a L2 cache.

The cache affinity of ARM cluster has been proved to improve the
kernel scheduling performance and a patchset has been posted, in
which a general sched_domain for clusters was added and a cluster
level was added in the arch-neutral cpu topology struct like below.

struct cpu_topology {
    int thread_id;
    int core_id;
    int cluster_id;
    int package_id;
    int llc_id;
    cpumask_t thread_sibling;
    cpumask_t core_sibling;
    cpumask_t cluster_sibling;
    cpumask_t llc_sibling;
}

In virtuallization, exposing the cluster level topology to guest
kernel may also improve the scheduling performance. So let's add
the -smp, clusters=* command line support for ARM cpu, then users
will be able to define a four-level cpu hierarchy for machines
and it will be sockets/clusters/cores/threads.

Because we only support clusters for ARM cpu currently, a new member
"smp_clusters" is only added to the VirtMachineState structure.

Signed-off-by: Yanan Wang <wangyanan55@huawei.com>
---
 include/hw/arm/virt.h |  1 +
 qemu-options.hx       | 26 +++++++++++++++-----------
 softmmu/vl.c          |  3 +++
 3 files changed, 19 insertions(+), 11 deletions(-)

Comments

Andrew Jones May 17, 2021, 9:07 a.m. UTC | #1
On Sun, May 16, 2021 at 06:32:25PM +0800, Yanan Wang wrote:
> In implementations of ARM architecture, at most there could be a
> cpu hierarchy like "sockets/dies/clusters/cores/threads" defined.
> For example, ARM64 server chip Kunpeng 920 totally has 2 sockets,
> 2 NUMA nodes (also means cpu dies) in each socket, 6 clusters in
> each NUMA node, 4 cores in each cluster, and doesn't support SMT.
> Clusters within the same NUMA share a L3 cache and cores within
> the same cluster share a L2 cache.
> 
> The cache affinity of ARM cluster has been proved to improve the
> kernel scheduling performance and a patchset has been posted, in
> which a general sched_domain for clusters was added and a cluster
> level was added in the arch-neutral cpu topology struct like below.
> 
> struct cpu_topology {
>     int thread_id;
>     int core_id;
>     int cluster_id;
>     int package_id;
>     int llc_id;
>     cpumask_t thread_sibling;
>     cpumask_t core_sibling;
>     cpumask_t cluster_sibling;
>     cpumask_t llc_sibling;
> }
> 
> In virtuallization, exposing the cluster level topology to guest
> kernel may also improve the scheduling performance. So let's add
> the -smp, clusters=* command line support for ARM cpu, then users
> will be able to define a four-level cpu hierarchy for machines
> and it will be sockets/clusters/cores/threads.
> 
> Because we only support clusters for ARM cpu currently, a new member
> "smp_clusters" is only added to the VirtMachineState structure.
> 
> Signed-off-by: Yanan Wang <wangyanan55@huawei.com>
> ---
>  include/hw/arm/virt.h |  1 +
>  qemu-options.hx       | 26 +++++++++++++++-----------
>  softmmu/vl.c          |  3 +++
>  3 files changed, 19 insertions(+), 11 deletions(-)
> 
> diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
> index f546dd2023..74fff9667b 100644
> --- a/include/hw/arm/virt.h
> +++ b/include/hw/arm/virt.h
> @@ -156,6 +156,7 @@ struct VirtMachineState {
>      char *pciehb_nodename;
>      const int *irqmap;
>      int fdt_size;
> +    unsigned smp_clusters;
>      uint32_t clock_phandle;
>      uint32_t gic_phandle;
>      uint32_t msi_phandle;
> diff --git a/qemu-options.hx b/qemu-options.hx
> index bd97086c21..245eb415a6 100644
> --- a/qemu-options.hx
> +++ b/qemu-options.hx
> @@ -184,25 +184,29 @@ SRST
>  ERST
>  
>  DEF("smp", HAS_ARG, QEMU_OPTION_smp,
> -    "-smp [cpus=]n[,maxcpus=cpus][,cores=cores][,threads=threads][,dies=dies][,sockets=sockets]\n"
> +    "-smp [cpus=]n[,maxcpus=cpus][,cores=cores][,threads=threads][,clusters=clusters][,dies=dies][,sockets=sockets]\n"
>      "                set the number of CPUs to 'n' [default=1]\n"
>      "                maxcpus= maximum number of total cpus, including\n"
>      "                offline CPUs for hotplug, etc\n"
> -    "                cores= number of CPU cores on one socket (for PC, it's on one die)\n"
> +    "                cores= number of CPU cores on one socket\n"
> +    "                (it's on one die for PC, and on one cluster for ARM)\n"
>      "                threads= number of threads on one CPU core\n"
> +    "                clusters= number of CPU clusters on one socket (for ARM only)\n"
>      "                dies= number of CPU dies on one socket (for PC only)\n"
>      "                sockets= number of discrete sockets in the system\n",
>          QEMU_ARCH_ALL)
>  SRST
> -``-smp [cpus=]n[,cores=cores][,threads=threads][,dies=dies][,sockets=sockets][,maxcpus=maxcpus]``
> -    Simulate an SMP system with n CPUs. On the PC target, up to 255 CPUs
> -    are supported. On Sparc32 target, Linux limits the number of usable
> -    CPUs to 4. For the PC target, the number of cores per die, the
> -    number of threads per cores, the number of dies per packages and the
> -    total number of sockets can be specified. Missing values will be
> -    computed. If any on the three values is given, the total number of
> -    CPUs n can be omitted. maxcpus specifies the maximum number of
> -    hotpluggable CPUs.
> +``-smp [cpus=]n[,cores=cores][,threads=threads][,clusters=clusters][,dies=dies][,sockets=sockets][,maxcpus=maxcpus]``
> +    Simulate an SMP system with n CPUs. On the PC target, up to 255
> +    CPUs are supported. On the Sparc32 target, Linux limits the number
> +    of usable CPUs to 4. For the PC target, the number of threads per
> +    core, the number of cores per die, the number of dies per package
> +    and the total number of sockets can be specified. For the ARM target,
> +    the number of threads per core, the number of cores per cluster, the
> +    number of clusters per socket and the total number of sockets can be
> +    specified. And missing values will be computed. If any of the five
                  ^ Why did you add this 'And'?
> +    values is given, the total number of CPUs n can be omitted.

The last two sentences are not valid for Arm, which requires most of its
parameters to be given.

> Maxcpus
> +    specifies the maximum number of hotpluggable CPUs.
>  
>      For the ARM target, at least one of cpus or maxcpus must be provided.
>      Threads will default to 1 if not provided. Sockets and cores must be
> diff --git a/softmmu/vl.c b/softmmu/vl.c
> index 307944aef3..69a5c73ef7 100644
> --- a/softmmu/vl.c
> +++ b/softmmu/vl.c
> @@ -719,6 +719,9 @@ static QemuOptsList qemu_smp_opts = {
>          }, {
>              .name = "dies",
>              .type = QEMU_OPT_NUMBER,
> +        }, {
> +            .name = "clusters",
> +            .type = QEMU_OPT_NUMBER,
>          }, {
>              .name = "cores",
>              .type = QEMU_OPT_NUMBER,
> -- 
> 2.19.1
>

Thanks,
drew
wangyanan (Y) May 17, 2021, 3:07 p.m. UTC | #2
On 2021/5/17 17:07, Andrew Jones wrote:
> On Sun, May 16, 2021 at 06:32:25PM +0800, Yanan Wang wrote:
>> In implementations of ARM architecture, at most there could be a
>> cpu hierarchy like "sockets/dies/clusters/cores/threads" defined.
>> For example, ARM64 server chip Kunpeng 920 totally has 2 sockets,
>> 2 NUMA nodes (also means cpu dies) in each socket, 6 clusters in
>> each NUMA node, 4 cores in each cluster, and doesn't support SMT.
>> Clusters within the same NUMA share a L3 cache and cores within
>> the same cluster share a L2 cache.
>>
>> The cache affinity of ARM cluster has been proved to improve the
>> kernel scheduling performance and a patchset has been posted, in
>> which a general sched_domain for clusters was added and a cluster
>> level was added in the arch-neutral cpu topology struct like below.
>>
>> struct cpu_topology {
>>      int thread_id;
>>      int core_id;
>>      int cluster_id;
>>      int package_id;
>>      int llc_id;
>>      cpumask_t thread_sibling;
>>      cpumask_t core_sibling;
>>      cpumask_t cluster_sibling;
>>      cpumask_t llc_sibling;
>> }
>>
>> In virtuallization, exposing the cluster level topology to guest
>> kernel may also improve the scheduling performance. So let's add
>> the -smp, clusters=* command line support for ARM cpu, then users
>> will be able to define a four-level cpu hierarchy for machines
>> and it will be sockets/clusters/cores/threads.
>>
>> Because we only support clusters for ARM cpu currently, a new member
>> "smp_clusters" is only added to the VirtMachineState structure.
>>
>> Signed-off-by: Yanan Wang <wangyanan55@huawei.com>
>> ---
>>   include/hw/arm/virt.h |  1 +
>>   qemu-options.hx       | 26 +++++++++++++++-----------
>>   softmmu/vl.c          |  3 +++
>>   3 files changed, 19 insertions(+), 11 deletions(-)
>>
>> diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
>> index f546dd2023..74fff9667b 100644
>> --- a/include/hw/arm/virt.h
>> +++ b/include/hw/arm/virt.h
>> @@ -156,6 +156,7 @@ struct VirtMachineState {
>>       char *pciehb_nodename;
>>       const int *irqmap;
>>       int fdt_size;
>> +    unsigned smp_clusters;
>>       uint32_t clock_phandle;
>>       uint32_t gic_phandle;
>>       uint32_t msi_phandle;
>> diff --git a/qemu-options.hx b/qemu-options.hx
>> index bd97086c21..245eb415a6 100644
>> --- a/qemu-options.hx
>> +++ b/qemu-options.hx
>> @@ -184,25 +184,29 @@ SRST
>>   ERST
>>   
>>   DEF("smp", HAS_ARG, QEMU_OPTION_smp,
>> -    "-smp [cpus=]n[,maxcpus=cpus][,cores=cores][,threads=threads][,dies=dies][,sockets=sockets]\n"
>> +    "-smp [cpus=]n[,maxcpus=cpus][,cores=cores][,threads=threads][,clusters=clusters][,dies=dies][,sockets=sockets]\n"
>>       "                set the number of CPUs to 'n' [default=1]\n"
>>       "                maxcpus= maximum number of total cpus, including\n"
>>       "                offline CPUs for hotplug, etc\n"
>> -    "                cores= number of CPU cores on one socket (for PC, it's on one die)\n"
>> +    "                cores= number of CPU cores on one socket\n"
>> +    "                (it's on one die for PC, and on one cluster for ARM)\n"
>>       "                threads= number of threads on one CPU core\n"
>> +    "                clusters= number of CPU clusters on one socket (for ARM only)\n"
>>       "                dies= number of CPU dies on one socket (for PC only)\n"
>>       "                sockets= number of discrete sockets in the system\n",
>>           QEMU_ARCH_ALL)
>>   SRST
>> -``-smp [cpus=]n[,cores=cores][,threads=threads][,dies=dies][,sockets=sockets][,maxcpus=maxcpus]``
>> -    Simulate an SMP system with n CPUs. On the PC target, up to 255 CPUs
>> -    are supported. On Sparc32 target, Linux limits the number of usable
>> -    CPUs to 4. For the PC target, the number of cores per die, the
>> -    number of threads per cores, the number of dies per packages and the
>> -    total number of sockets can be specified. Missing values will be
>> -    computed. If any on the three values is given, the total number of
>> -    CPUs n can be omitted. maxcpus specifies the maximum number of
>> -    hotpluggable CPUs.
>> +``-smp [cpus=]n[,cores=cores][,threads=threads][,clusters=clusters][,dies=dies][,sockets=sockets][,maxcpus=maxcpus]``
>> +    Simulate an SMP system with n CPUs. On the PC target, up to 255
>> +    CPUs are supported. On the Sparc32 target, Linux limits the number
>> +    of usable CPUs to 4. For the PC target, the number of threads per
>> +    core, the number of cores per die, the number of dies per package
>> +    and the total number of sockets can be specified. For the ARM target,
>> +    the number of threads per core, the number of cores per cluster, the
>> +    number of clusters per socket and the total number of sockets can be
>> +    specified. And missing values will be computed. If any of the five
>                    ^ Why did you add this 'And'?
My fault.. I will drop it.
>> +    values is given, the total number of CPUs n can be omitted.
> The last two sentences are not valid for Arm, which requires most of its
> parameters to be given.
Yes, indeed. I think I should state more *clearly* about these two 
sentences.
Will rearrange the Doc in v4.

Thanks,
Yanan
>> Maxcpus
>> +    specifies the maximum number of hotpluggable CPUs.
>>   
>>       For the ARM target, at least one of cpus or maxcpus must be provided.
>>       Threads will default to 1 if not provided. Sockets and cores must be
>> diff --git a/softmmu/vl.c b/softmmu/vl.c
>> index 307944aef3..69a5c73ef7 100644
>> --- a/softmmu/vl.c
>> +++ b/softmmu/vl.c
>> @@ -719,6 +719,9 @@ static QemuOptsList qemu_smp_opts = {
>>           }, {
>>               .name = "dies",
>>               .type = QEMU_OPT_NUMBER,
>> +        }, {
>> +            .name = "clusters",
>> +            .type = QEMU_OPT_NUMBER,
>>           }, {
>>               .name = "cores",
>>               .type = QEMU_OPT_NUMBER,
>> -- 
>> 2.19.1
>>
> Thanks,
> drew
>
> .
diff mbox series

Patch

diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
index f546dd2023..74fff9667b 100644
--- a/include/hw/arm/virt.h
+++ b/include/hw/arm/virt.h
@@ -156,6 +156,7 @@  struct VirtMachineState {
     char *pciehb_nodename;
     const int *irqmap;
     int fdt_size;
+    unsigned smp_clusters;
     uint32_t clock_phandle;
     uint32_t gic_phandle;
     uint32_t msi_phandle;
diff --git a/qemu-options.hx b/qemu-options.hx
index bd97086c21..245eb415a6 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -184,25 +184,29 @@  SRST
 ERST
 
 DEF("smp", HAS_ARG, QEMU_OPTION_smp,
-    "-smp [cpus=]n[,maxcpus=cpus][,cores=cores][,threads=threads][,dies=dies][,sockets=sockets]\n"
+    "-smp [cpus=]n[,maxcpus=cpus][,cores=cores][,threads=threads][,clusters=clusters][,dies=dies][,sockets=sockets]\n"
     "                set the number of CPUs to 'n' [default=1]\n"
     "                maxcpus= maximum number of total cpus, including\n"
     "                offline CPUs for hotplug, etc\n"
-    "                cores= number of CPU cores on one socket (for PC, it's on one die)\n"
+    "                cores= number of CPU cores on one socket\n"
+    "                (it's on one die for PC, and on one cluster for ARM)\n"
     "                threads= number of threads on one CPU core\n"
+    "                clusters= number of CPU clusters on one socket (for ARM only)\n"
     "                dies= number of CPU dies on one socket (for PC only)\n"
     "                sockets= number of discrete sockets in the system\n",
         QEMU_ARCH_ALL)
 SRST
-``-smp [cpus=]n[,cores=cores][,threads=threads][,dies=dies][,sockets=sockets][,maxcpus=maxcpus]``
-    Simulate an SMP system with n CPUs. On the PC target, up to 255 CPUs
-    are supported. On Sparc32 target, Linux limits the number of usable
-    CPUs to 4. For the PC target, the number of cores per die, the
-    number of threads per cores, the number of dies per packages and the
-    total number of sockets can be specified. Missing values will be
-    computed. If any on the three values is given, the total number of
-    CPUs n can be omitted. maxcpus specifies the maximum number of
-    hotpluggable CPUs.
+``-smp [cpus=]n[,cores=cores][,threads=threads][,clusters=clusters][,dies=dies][,sockets=sockets][,maxcpus=maxcpus]``
+    Simulate an SMP system with n CPUs. On the PC target, up to 255
+    CPUs are supported. On the Sparc32 target, Linux limits the number
+    of usable CPUs to 4. For the PC target, the number of threads per
+    core, the number of cores per die, the number of dies per package
+    and the total number of sockets can be specified. For the ARM target,
+    the number of threads per core, the number of cores per cluster, the
+    number of clusters per socket and the total number of sockets can be
+    specified. And missing values will be computed. If any of the five
+    values is given, the total number of CPUs n can be omitted. Maxcpus
+    specifies the maximum number of hotpluggable CPUs.
 
     For the ARM target, at least one of cpus or maxcpus must be provided.
     Threads will default to 1 if not provided. Sockets and cores must be
diff --git a/softmmu/vl.c b/softmmu/vl.c
index 307944aef3..69a5c73ef7 100644
--- a/softmmu/vl.c
+++ b/softmmu/vl.c
@@ -719,6 +719,9 @@  static QemuOptsList qemu_smp_opts = {
         }, {
             .name = "dies",
             .type = QEMU_OPT_NUMBER,
+        }, {
+            .name = "clusters",
+            .type = QEMU_OPT_NUMBER,
         }, {
             .name = "cores",
             .type = QEMU_OPT_NUMBER,