mbox series

[v5,0/6] Add support for FORM2 associativity

Message ID 20210628151117.545935-1-aneesh.kumar@linux.ibm.com (mailing list archive)
Headers show
Series Add support for FORM2 associativity | expand

Message

Aneesh Kumar K V June 28, 2021, 3:11 p.m. UTC
Form2 associativity adds a much more flexible NUMA topology layout
than what is provided by Form1. More details can be found in patch 7.

$ numactl -H
...
node distances:
node   0   1   2   3 
  0:  10  11  222  33 
  1:  44  10  55  66 
  2:  77  88  10  99 
  3:  101  121  132  10 
$

After DAX kmem memory add
# numactl -H
available: 5 nodes (0-4)
...
node distances:
node   0   1   2   3   4 
  0:  10  11  222  33  240 
  1:  44  10  55  66  255 
  2:  77  88  10  99  255 
  3:  101  121  132  10  230 
  4:  255  255  255  230  10 


PAPR SCM now use the numa distance details to find the numa_node and target_node
for the device.

kvaneesh@ubuntu-guest:~$ ndctl  list -N -v 
[
  {
    "dev":"namespace0.0",
    "mode":"devdax",
    "map":"dev",
    "size":1071644672,
    "uuid":"d333d867-3f57-44c8-b386-d4d3abdc2bf2",
    "raw_uuid":"915361ad-fe6a-42dd-848f-d6dc9f5af362",
    "daxregion":{
      "id":0,
      "size":1071644672,
      "devices":[
        {
          "chardev":"dax0.0",
          "size":1071644672,
          "target_node":4,
          "mode":"devdax"
        }
      ]
    },
    "align":2097152,
    "numa_node":3
  }
]
kvaneesh@ubuntu-guest:~$ 


The above output is with a Qemu command line

-numa node,nodeid=4 \
-numa dist,src=0,dst=1,val=11 -numa dist,src=0,dst=2,val=222 -numa dist,src=0,dst=3,val=33 -numa dist,src=0,dst=4,val=240 \
-numa dist,src=1,dst=0,val=44 -numa dist,src=1,dst=2,val=55 -numa dist,src=1,dst=3,val=66 -numa dist,src=1,dst=4,val=255 \
-numa dist,src=2,dst=0,val=77 -numa dist,src=2,dst=1,val=88 -numa dist,src=2,dst=3,val=99 -numa dist,src=2,dst=4,val=255 \
-numa dist,src=3,dst=0,val=101 -numa dist,src=3,dst=1,val=121 -numa dist,src=3,dst=2,val=132 -numa dist,src=3,dst=4,val=230 \
-numa dist,src=4,dst=0,val=255 -numa dist,src=4,dst=1,val=255 -numa dist,src=4,dst=2,val=255 -numa dist,src=4,dst=3,val=230 \
-object memory-backend-file,id=memnvdimm1,prealloc=yes,mem-path=$PMEM_DISK,share=yes,size=${PMEM_SIZE}  \
-device nvdimm,label-size=128K,memdev=memnvdimm1,id=nvdimm1,slot=4,uuid=72511b67-0b3b-42fd-8d1d-5be3cae8bcaa,node=4

Qemu changes can be found at https://lore.kernel.org/qemu-devel/20210616011944.2996399-1-danielhb413@gmail.com/

Changes from v4:
* Drop DLPAR related device tree property for now because both Qemu nor PowerVM
  will provide the distance details of all possible NUMA nodes during boot.
* Rework numa distance code based on review feedback.

Changes from v3:
* Drop PAPR SCM specific changes and depend completely on NUMA distance information.

Changes from v2:
* Add nvdimm list to Cc:
* update PATCH 8 commit message.

Changes from v1:
* Update FORM2 documentation.
* rename max_domain_index to max_associativity_domain_index


Aneesh Kumar K.V (6):
  powerpc/pseries: rename min_common_depth to primary_domain_index
  powerpc/pseries: rename distance_ref_points_depth to
    max_associativity_domain_index
  powerpc/pseries: Rename TYPE1_AFFINITY to FORM1_AFFINITY
  powerpc/pseries: Consolidate different NUMA distance update code paths
  powerpc/pseries: Add a helper for form1 cpu distance
  powerpc/pseries: Add support for FORM2 associativity

 Documentation/powerpc/associativity.rst       | 103 +++++
 arch/powerpc/include/asm/firmware.h           |   7 +-
 arch/powerpc/include/asm/prom.h               |   3 +-
 arch/powerpc/include/asm/topology.h           |   4 +-
 arch/powerpc/kernel/prom_init.c               |   3 +-
 arch/powerpc/mm/numa.c                        | 415 +++++++++++++-----
 arch/powerpc/platforms/pseries/firmware.c     |   3 +-
 arch/powerpc/platforms/pseries/hotplug-cpu.c  |   2 +
 .../platforms/pseries/hotplug-memory.c        |   2 +
 arch/powerpc/platforms/pseries/lpar.c         |   4 +-
 arch/powerpc/platforms/pseries/pseries.h      |   1 +
 11 files changed, 432 insertions(+), 115 deletions(-)
 create mode 100644 Documentation/powerpc/associativity.rst

Comments

Daniel Henrique Barboza July 13, 2021, 2:27 p.m. UTC | #1
Aneesh,

This series compiles with a configuration made with "pseries_le_defconfig"
but fails with a config based on an existing RHEL8 config.

The reason, which is hinted in the robot replies in patch 4, is that you defined
a "__vphn_get_associativity" inside a #ifdef CONFIG_PPC_SPLPAR guard but didn't
define how the function would behave without the config, and you ended up
using the function elsewhere.

This fixes the compilation but I'm not sure if this is what you intended
for this function:


diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index c68846fc9550..6e8551d16b7a 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -680,6 +680,11 @@ static int vphn_get_nid(long lcpu)
  
  }
  #else
+static int __vphn_get_associativity(long lcpu, __be32 *associativity)
+{
+       return -1;
+}
+
  static int vphn_get_nid(long unused)
  {
         return NUMA_NO_NODE;


I'll post a new version of the QEMU FORM2 changes using these patches as is (with
the above fixup), but I guess you'll want to post a v6.



Thanks,



Daniel




On 6/28/21 12:11 PM, Aneesh Kumar K.V wrote:
> Form2 associativity adds a much more flexible NUMA topology layout
> than what is provided by Form1. More details can be found in patch 7.
> 
> $ numactl -H
> ...
> node distances:
> node   0   1   2   3
>    0:  10  11  222  33
>    1:  44  10  55  66
>    2:  77  88  10  99
>    3:  101  121  132  10
> $
> 
> After DAX kmem memory add
> # numactl -H
> available: 5 nodes (0-4)
> ...
> node distances:
> node   0   1   2   3   4
>    0:  10  11  222  33  240
>    1:  44  10  55  66  255
>    2:  77  88  10  99  255
>    3:  101  121  132  10  230
>    4:  255  255  255  230  10
> 
> 
> PAPR SCM now use the numa distance details to find the numa_node and target_node
> for the device.
> 
> kvaneesh@ubuntu-guest:~$ ndctl  list -N -v
> [
>    {
>      "dev":"namespace0.0",
>      "mode":"devdax",
>      "map":"dev",
>      "size":1071644672,
>      "uuid":"d333d867-3f57-44c8-b386-d4d3abdc2bf2",
>      "raw_uuid":"915361ad-fe6a-42dd-848f-d6dc9f5af362",
>      "daxregion":{
>        "id":0,
>        "size":1071644672,
>        "devices":[
>          {
>            "chardev":"dax0.0",
>            "size":1071644672,
>            "target_node":4,
>            "mode":"devdax"
>          }
>        ]
>      },
>      "align":2097152,
>      "numa_node":3
>    }
> ]
> kvaneesh@ubuntu-guest:~$
> 
> 
> The above output is with a Qemu command line
> 
> -numa node,nodeid=4 \
> -numa dist,src=0,dst=1,val=11 -numa dist,src=0,dst=2,val=222 -numa dist,src=0,dst=3,val=33 -numa dist,src=0,dst=4,val=240 \
> -numa dist,src=1,dst=0,val=44 -numa dist,src=1,dst=2,val=55 -numa dist,src=1,dst=3,val=66 -numa dist,src=1,dst=4,val=255 \
> -numa dist,src=2,dst=0,val=77 -numa dist,src=2,dst=1,val=88 -numa dist,src=2,dst=3,val=99 -numa dist,src=2,dst=4,val=255 \
> -numa dist,src=3,dst=0,val=101 -numa dist,src=3,dst=1,val=121 -numa dist,src=3,dst=2,val=132 -numa dist,src=3,dst=4,val=230 \
> -numa dist,src=4,dst=0,val=255 -numa dist,src=4,dst=1,val=255 -numa dist,src=4,dst=2,val=255 -numa dist,src=4,dst=3,val=230 \
> -object memory-backend-file,id=memnvdimm1,prealloc=yes,mem-path=$PMEM_DISK,share=yes,size=${PMEM_SIZE}  \
> -device nvdimm,label-size=128K,memdev=memnvdimm1,id=nvdimm1,slot=4,uuid=72511b67-0b3b-42fd-8d1d-5be3cae8bcaa,node=4
> 
> Qemu changes can be found at https://lore.kernel.org/qemu-devel/20210616011944.2996399-1-danielhb413@gmail.com/
> 
> Changes from v4:
> * Drop DLPAR related device tree property for now because both Qemu nor PowerVM
>    will provide the distance details of all possible NUMA nodes during boot.
> * Rework numa distance code based on review feedback.
> 
> Changes from v3:
> * Drop PAPR SCM specific changes and depend completely on NUMA distance information.
> 
> Changes from v2:
> * Add nvdimm list to Cc:
> * update PATCH 8 commit message.
> 
> Changes from v1:
> * Update FORM2 documentation.
> * rename max_domain_index to max_associativity_domain_index
> 
> 
> Aneesh Kumar K.V (6):
>    powerpc/pseries: rename min_common_depth to primary_domain_index
>    powerpc/pseries: rename distance_ref_points_depth to
>      max_associativity_domain_index
>    powerpc/pseries: Rename TYPE1_AFFINITY to FORM1_AFFINITY
>    powerpc/pseries: Consolidate different NUMA distance update code paths
>    powerpc/pseries: Add a helper for form1 cpu distance
>    powerpc/pseries: Add support for FORM2 associativity
> 
>   Documentation/powerpc/associativity.rst       | 103 +++++
>   arch/powerpc/include/asm/firmware.h           |   7 +-
>   arch/powerpc/include/asm/prom.h               |   3 +-
>   arch/powerpc/include/asm/topology.h           |   4 +-
>   arch/powerpc/kernel/prom_init.c               |   3 +-
>   arch/powerpc/mm/numa.c                        | 415 +++++++++++++-----
>   arch/powerpc/platforms/pseries/firmware.c     |   3 +-
>   arch/powerpc/platforms/pseries/hotplug-cpu.c  |   2 +
>   .../platforms/pseries/hotplug-memory.c        |   2 +
>   arch/powerpc/platforms/pseries/lpar.c         |   4 +-
>   arch/powerpc/platforms/pseries/pseries.h      |   1 +
>   11 files changed, 432 insertions(+), 115 deletions(-)
>   create mode 100644 Documentation/powerpc/associativity.rst
>
Aneesh Kumar K V July 13, 2021, 2:30 p.m. UTC | #2
On 7/13/21 7:57 PM, Daniel Henrique Barboza wrote:
> Aneesh,
> 
> This series compiles with a configuration made with "pseries_le_defconfig"
> but fails with a config based on an existing RHEL8 config.
> 
> The reason, which is hinted in the robot replies in patch 4, is that you 
> defined
> a "__vphn_get_associativity" inside a #ifdef CONFIG_PPC_SPLPAR guard but 
> didn't
> define how the function would behave without the config, and you ended up
> using the function elsewhere.
> 
> This fixes the compilation but I'm not sure if this is what you intended
> for this function:
> 
> 
> diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
> index c68846fc9550..6e8551d16b7a 100644
> --- a/arch/powerpc/mm/numa.c
> +++ b/arch/powerpc/mm/numa.c
> @@ -680,6 +680,11 @@ static int vphn_get_nid(long lcpu)
> 
>   }
>   #else
> +static int __vphn_get_associativity(long lcpu, __be32 *associativity)
> +{
> +       return -1;
> +}
> +
>   static int vphn_get_nid(long unused)
>   {
>          return NUMA_NO_NODE;
> 
> 
> I'll post a new version of the QEMU FORM2 changes using these patches as 
> is (with
> the above fixup), but I guess you'll want to post a v6.
> 

kernel test robot did report that earlier and I have that fixed in my 
local tree. I haven't posted v6 yet because I want to close the review 
on the approach with v5 patchset.

-aneesh