diff mbox series

[V2,2/3] pseries/findnodes: Find nodes with memory for memoryless nodes

Message ID bcccffb8-8472-463d-cca5-698db5cbc8e1@linux.vnet.ibm.com (mailing list archive)
State Rejected
Headers show
Series pseries/nodes: Fix issues with memoryless nodes | expand

Commit Message

Michael Bringmann Oct. 18, 2017, 8:09 p.m. UTC
pseries/findnodes: On pseries systems which allow 'hot-add' of
resources, we may boot configurations that have CPUs, but no memory
associated to a node by the affinity calculations.  Previously, the
software took a shortcut to collapse initialization and references
to such memoryless nodes with other nodes that did have memory
associated with them at boot.  This patch is based on fixes that
allow the proper initialization and distinguishment of memoryless
and memory-plus nodes after NUMA initialization.  It extends the
use of the 'node_to_mem_node()' API from 'topology.h' to modules
that are allocating node-specific memory at boot, and allows such
references to find available memory in another node.

Signed-off-by: Michael Bringmann <mwb@linux.vnet.ibm.com>
---
 block/blk-mq-cpumap.c |    3 ++-
 mm/page_alloc.c       |    1 +
 2 files changed, 3 insertions(+), 1 deletion(-)

Comments

Michael Ellerman Oct. 19, 2017, 8:56 a.m. UTC | #1
Hi Michael,

Michael Bringmann <mwb@linux.vnet.ibm.com> writes:
> pseries/findnodes: On pseries systems which allow 'hot-add' of

This isn't a powerpc or pseries patch, so the subject/prefix is wrong.

Also because you're changing generic code you need to provide an
explanation that makes sense in general, across all architectures, not
just in terms of what the pseries platform does.

> resources, we may boot configurations that have CPUs, but no memory
> associated to a node by the affinity calculations.

This is called a "memory-less node" and is understood by the generic
code.

> Previously, the
> software took a shortcut to collapse initialization and references

What software? What shortcut?

> to such memoryless nodes with other nodes that did have memory
> associated with them at boot.  This patch is based on fixes that

What fixes?

> allow the proper initialization and distinguishment of memoryless
> and memory-plus nodes after NUMA initialization.

What exactly is unproper about the current code?

> It extends the 
> use of the 'node_to_mem_node()' API from 'topology.h' to modules

The term "modules" has a specific meaning in Linux which is not correct
here. We would just say "in two functions" or "in two files".

> that are allocating node-specific memory at boot, and allows such
> references to find available memory in another node.


> diff --git a/block/blk-mq-cpumap.c b/block/blk-mq-cpumap.c
> index 9f8cffc..a27a31f 100644
> --- a/block/blk-mq-cpumap.c
> +++ b/block/blk-mq-cpumap.c
> @@ -73,7 +73,8 @@ int blk_mq_hw_queue_to_node(unsigned int *mq_map, unsigned int index)
>  
>  	for_each_possible_cpu(i) {
>  		if (index == mq_map[i])
> -			return local_memory_node(cpu_to_node(i));
> +			return local_memory_node(
> +					node_to_mem_node(cpu_to_node(i)));

What is this trying to do?

local_memory_node() is supposed to return a "local" node for nodes with
no memory.

And in fact the comment says:

  * Used for initializing percpu 'numa_mem'

Which is what we do:

	set_numa_mem(local_memory_node(numa_cpu_lookup_table[cpu]));

And is what's returned by node_to_mem_node():

  static inline void set_numa_mem(int node)
  {
  	this_cpu_write(_numa_mem_, node);
  	_node_numa_mem_[numa_node_id()] = node;
  }
  
  static inline int node_to_mem_node(int node)
  {
  	return _node_numa_mem_[node];
  }

So your change effectively ends up doing:

	return local_memory_node(local_memory_node(cpu_to_node(i)));

Which doesn't look right.


cheers
Michael Bringmann Nov. 15, 2017, 5:38 p.m. UTC | #2
Hello:
    Sorry for the out-of-date description.  This entire patch has been
removed / eliminated from subsequent patch sets.  All changes to correct
powerpc memoryless nodes will be confined to powerpc-specific code.
    Regards,
Michael

On 10/19/2017 03:56 AM, Michael Ellerman wrote:
> Hi Michael,
> 
> Michael Bringmann <mwb@linux.vnet.ibm.com> writes:
>> pseries/findnodes: On pseries systems which allow 'hot-add' of
> 
> This isn't a powerpc or pseries patch, so the subject/prefix is wrong.
> 
> Also because you're changing generic code you need to provide an
> explanation that makes sense in general, across all architectures, not
> just in terms of what the pseries platform does.
> 
>> resources, we may boot configurations that have CPUs, but no memory
>> associated to a node by the affinity calculations.
> 
> This is called a "memory-less node" and is understood by the generic
> code.
> 
>> Previously, the
>> software took a shortcut to collapse initialization and references
> 
> What software? What shortcut?
> 
>> to such memoryless nodes with other nodes that did have memory
>> associated with them at boot.  This patch is based on fixes that
> 
> What fixes?
> 
>> allow the proper initialization and distinguishment of memoryless
>> and memory-plus nodes after NUMA initialization.
> 
> What exactly is unproper about the current code?
> 
>> It extends the 
>> use of the 'node_to_mem_node()' API from 'topology.h' to modules
> 
> The term "modules" has a specific meaning in Linux which is not correct
> here. We would just say "in two functions" or "in two files".
> 
>> that are allocating node-specific memory at boot, and allows such
>> references to find available memory in another node.
> 
> 
>> diff --git a/block/blk-mq-cpumap.c b/block/blk-mq-cpumap.c
>> index 9f8cffc..a27a31f 100644
>> --- a/block/blk-mq-cpumap.c
>> +++ b/block/blk-mq-cpumap.c
>> @@ -73,7 +73,8 @@ int blk_mq_hw_queue_to_node(unsigned int *mq_map, unsigned int index)
>>  
>>  	for_each_possible_cpu(i) {
>>  		if (index == mq_map[i])
>> -			return local_memory_node(cpu_to_node(i));
>> +			return local_memory_node(
>> +					node_to_mem_node(cpu_to_node(i)));
> 
> What is this trying to do?
> 
> local_memory_node() is supposed to return a "local" node for nodes with
> no memory.
> 
> And in fact the comment says:
> 
>   * Used for initializing percpu 'numa_mem'
> 
> Which is what we do:
> 
> 	set_numa_mem(local_memory_node(numa_cpu_lookup_table[cpu]));
> 
> And is what's returned by node_to_mem_node():
> 
>   static inline void set_numa_mem(int node)
>   {
>   	this_cpu_write(_numa_mem_, node);
>   	_node_numa_mem_[numa_node_id()] = node;
>   }
>   
>   static inline int node_to_mem_node(int node)
>   {
>   	return _node_numa_mem_[node];
>   }
> 
> So your change effectively ends up doing:
> 
> 	return local_memory_node(local_memory_node(cpu_to_node(i)));
> 
> Which doesn't look right.
> 
> 
> cheers
> 
>
diff mbox series

Patch

diff --git a/block/blk-mq-cpumap.c b/block/blk-mq-cpumap.c
index 9f8cffc..a27a31f 100644
--- a/block/blk-mq-cpumap.c
+++ b/block/blk-mq-cpumap.c
@@ -73,7 +73,8 @@  int blk_mq_hw_queue_to_node(unsigned int *mq_map, unsigned int index)
 
 	for_each_possible_cpu(i) {
 		if (index == mq_map[i])
-			return local_memory_node(cpu_to_node(i));
+			return local_memory_node(
+					node_to_mem_node(cpu_to_node(i)));
 	}
 
 	return NUMA_NO_NODE;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 77e4d3c..e7aaa2a 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4188,6 +4188,7 @@  struct page *
 
 	gfp_mask &= gfp_allowed_mask;
 	alloc_mask = gfp_mask;
+	preferred_nid = node_to_mem_node(preferred_nid);
 	if (!prepare_alloc_pages(gfp_mask, order, preferred_nid, nodemask, &ac, &alloc_mask, &alloc_flags))
 		return NULL;