diff mbox series

[05/10] spapr: make ibm, max-associativity-domains scale with user input

Message ID 20200814205424.543857-6-danielhb413@gmail.com
State New
Headers show
Series pseries NUMA distance rework | expand

Commit Message

Daniel Henrique Barboza Aug. 14, 2020, 8:54 p.m. UTC
The ibm,max-associativity-domains is considering that only a single
associativity domain can exist in the same NUMA level. This is true
today because we do not support any type of NUMA distance user
customization, and all nodes are in the same distance to each other.

To enhance NUMA distance support in the pSeries machine we need to
make this limit flexible. This patch rewrites the max-associativity
logic to consider that multiple associativity domains can co-exist
in the same NUMA level. We're using the legacy_numa() helper to
avoid leaking unneeded guest changes.

Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
---
 hw/ppc/spapr.c | 18 ++++++++++++++++--
 1 file changed, 16 insertions(+), 2 deletions(-)

Comments

David Gibson Aug. 20, 2020, 2:55 a.m. UTC | #1
On Fri, Aug 14, 2020 at 05:54:19PM -0300, Daniel Henrique Barboza wrote:
> The ibm,max-associativity-domains is considering that only a single
> associativity domain can exist in the same NUMA level. This is true
> today because we do not support any type of NUMA distance user
> customization, and all nodes are in the same distance to each other.
> 
> To enhance NUMA distance support in the pSeries machine we need to
> make this limit flexible. This patch rewrites the max-associativity
> logic to consider that multiple associativity domains can co-exist
> in the same NUMA level. We're using the legacy_numa() helper to
> avoid leaking unneeded guest changes.


Hrm.  I find the above a bit hard to understand.  Having the limit be
one less than the number of nodes at every level except the last seems
kind of odd to me.

> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
> ---
>  hw/ppc/spapr.c | 18 ++++++++++++++++--
>  1 file changed, 16 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 073a59c47d..b0c4b80a23 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -919,13 +919,20 @@ static void spapr_dt_rtas(SpaprMachineState *spapr, void *fdt)
>          cpu_to_be32(SPAPR_MEMORY_BLOCK_SIZE & 0xffffffff),
>          cpu_to_be32(ms->smp.max_cpus / ms->smp.threads),
>      };
> -    uint32_t maxdomain = cpu_to_be32(spapr->extra_numa_nodes > 1 ? 1 : 0);
> +
> +    /* The maximum domains for a given NUMA level, supposing that every
> +     * additional NUMA node belongs to the same domain (aside from the
> +     * 4th level, where we must support all available NUMA domains), is
> +     * total number of domains - 1. */
> +    uint32_t total_nodes_number = ms->numa_state->num_nodes +
> +                                  spapr->extra_numa_nodes;
> +    uint32_t maxdomain = cpu_to_be32(total_nodes_number - 1);
>      uint32_t maxdomains[] = {
>          cpu_to_be32(4),
>          maxdomain,
>          maxdomain,
>          maxdomain,
> -        cpu_to_be32(ms->numa_state->num_nodes + spapr->extra_numa_nodes),
> +        cpu_to_be32(total_nodes_number),
>      };
>  
>      _FDT(rtas = fdt_add_subnode(fdt, 0, "rtas"));
> @@ -962,6 +969,13 @@ static void spapr_dt_rtas(SpaprMachineState *spapr, void *fdt)
>                       qemu_hypertas->str, qemu_hypertas->len));
>      g_string_free(qemu_hypertas, TRUE);
>  
> +    if (spapr_machine_using_legacy_numa(spapr)) {
> +        maxdomain = cpu_to_be32(spapr->extra_numa_nodes > 1 ? 1 : 0);
> +        maxdomains[1] = maxdomain;
> +        maxdomains[2] = maxdomain;
> +        maxdomains[3] = maxdomain;
> +    }
> +
>      if (smc->pre_5_1_assoc_refpoints) {
>          nr_refpoints = 2;
>      }
Daniel Henrique Barboza Aug. 26, 2020, 9:17 p.m. UTC | #2
On 8/19/20 11:55 PM, David Gibson wrote:
> On Fri, Aug 14, 2020 at 05:54:19PM -0300, Daniel Henrique Barboza wrote:
>> The ibm,max-associativity-domains is considering that only a single
>> associativity domain can exist in the same NUMA level. This is true
>> today because we do not support any type of NUMA distance user
>> customization, and all nodes are in the same distance to each other.
>>
>> To enhance NUMA distance support in the pSeries machine we need to
>> make this limit flexible. This patch rewrites the max-associativity
>> logic to consider that multiple associativity domains can co-exist
>> in the same NUMA level. We're using the legacy_numa() helper to
>> avoid leaking unneeded guest changes.
> 
> 
> Hrm.  I find the above a bit hard to understand.  Having the limit be
> one less than the number of nodes at every level except the last seems
> kind of odd to me.

I took a bit to reply on this because I was reconsidering this logic.

I tried to "not be greedy" with this maximum number and ended up doing
something that breaks in a simple scenario. Today, in a single conf with
a single NUMA node with a single CPU, and say 2 GPUs, given that all GPUs
are in their own associativity domains, we would have something like:

cpu0: 0 0 0 0 0 0
gpu1: gpu_1 gpu_1 gpu_1 gpu_1
gpu2: gpu_2 gpu_2 gpu_2 gpu_2

This would already break apart what I did there. I think we should simplify
and just set maxdomains to be all nodes in all levels, like we do today
but using spapr->gpu_numa_id as an alias to maxnodes.


Thanks,

DHB

> 
>> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
>> ---
>>   hw/ppc/spapr.c | 18 ++++++++++++++++--
>>   1 file changed, 16 insertions(+), 2 deletions(-)
>>
>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
>> index 073a59c47d..b0c4b80a23 100644
>> --- a/hw/ppc/spapr.c
>> +++ b/hw/ppc/spapr.c
>> @@ -919,13 +919,20 @@ static void spapr_dt_rtas(SpaprMachineState *spapr, void *fdt)
>>           cpu_to_be32(SPAPR_MEMORY_BLOCK_SIZE & 0xffffffff),
>>           cpu_to_be32(ms->smp.max_cpus / ms->smp.threads),
>>       };
>> -    uint32_t maxdomain = cpu_to_be32(spapr->extra_numa_nodes > 1 ? 1 : 0);
>> +
>> +    /* The maximum domains for a given NUMA level, supposing that every
>> +     * additional NUMA node belongs to the same domain (aside from the
>> +     * 4th level, where we must support all available NUMA domains), is
>> +     * total number of domains - 1. */
>> +    uint32_t total_nodes_number = ms->numa_state->num_nodes +
>> +                                  spapr->extra_numa_nodes;
>> +    uint32_t maxdomain = cpu_to_be32(total_nodes_number - 1);
>>       uint32_t maxdomains[] = {
>>           cpu_to_be32(4),
>>           maxdomain,
>>           maxdomain,
>>           maxdomain,
>> -        cpu_to_be32(ms->numa_state->num_nodes + spapr->extra_numa_nodes),
>> +        cpu_to_be32(total_nodes_number),
>>       };
>>   
>>       _FDT(rtas = fdt_add_subnode(fdt, 0, "rtas"));
>> @@ -962,6 +969,13 @@ static void spapr_dt_rtas(SpaprMachineState *spapr, void *fdt)
>>                        qemu_hypertas->str, qemu_hypertas->len));
>>       g_string_free(qemu_hypertas, TRUE);
>>   
>> +    if (spapr_machine_using_legacy_numa(spapr)) {
>> +        maxdomain = cpu_to_be32(spapr->extra_numa_nodes > 1 ? 1 : 0);
>> +        maxdomains[1] = maxdomain;
>> +        maxdomains[2] = maxdomain;
>> +        maxdomains[3] = maxdomain;
>> +    }
>> +
>>       if (smc->pre_5_1_assoc_refpoints) {
>>           nr_refpoints = 2;
>>       }
>
diff mbox series

Patch

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 073a59c47d..b0c4b80a23 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -919,13 +919,20 @@  static void spapr_dt_rtas(SpaprMachineState *spapr, void *fdt)
         cpu_to_be32(SPAPR_MEMORY_BLOCK_SIZE & 0xffffffff),
         cpu_to_be32(ms->smp.max_cpus / ms->smp.threads),
     };
-    uint32_t maxdomain = cpu_to_be32(spapr->extra_numa_nodes > 1 ? 1 : 0);
+
+    /* The maximum domains for a given NUMA level, supposing that every
+     * additional NUMA node belongs to the same domain (aside from the
+     * 4th level, where we must support all available NUMA domains), is
+     * total number of domains - 1. */
+    uint32_t total_nodes_number = ms->numa_state->num_nodes +
+                                  spapr->extra_numa_nodes;
+    uint32_t maxdomain = cpu_to_be32(total_nodes_number - 1);
     uint32_t maxdomains[] = {
         cpu_to_be32(4),
         maxdomain,
         maxdomain,
         maxdomain,
-        cpu_to_be32(ms->numa_state->num_nodes + spapr->extra_numa_nodes),
+        cpu_to_be32(total_nodes_number),
     };
 
     _FDT(rtas = fdt_add_subnode(fdt, 0, "rtas"));
@@ -962,6 +969,13 @@  static void spapr_dt_rtas(SpaprMachineState *spapr, void *fdt)
                      qemu_hypertas->str, qemu_hypertas->len));
     g_string_free(qemu_hypertas, TRUE);
 
+    if (spapr_machine_using_legacy_numa(spapr)) {
+        maxdomain = cpu_to_be32(spapr->extra_numa_nodes > 1 ? 1 : 0);
+        maxdomains[1] = maxdomain;
+        maxdomains[2] = maxdomain;
+        maxdomains[3] = maxdomain;
+    }
+
     if (smc->pre_5_1_assoc_refpoints) {
         nr_refpoints = 2;
     }