diff mbox series

[1/2] powerpc/nodes: Ensure enough nodes avail for operations

Message ID d62dd6c7-7676-5fe0-4f63-baada9213908@linux.vnet.ibm.com (mailing list archive)
State Superseded
Headers show
Series powerpc/nodes/hotplug: Fix problem with memoryless nodes | expand

Commit Message

Michael Bringmann Sept. 18, 2017, 6:28 p.m. UTC
powerpc/nodes: On systems like PowerPC which allow 'hot-add' of CPU
or memory resources, it may occur that the new resources are to be
inserted into nodes that were not used for these resources at bootup.
In the kernel, any node that is used must be defined and initialized
at boot.

This patch extracts the value of the lowest domain level (number of
allocable resources) from the "rtas" device tree property
"ibm,current-associativity-domains" or the device tree property
"ibm,max-associativity-domains" to use as the maximum number of nodes
to setup as possibly available in the system.  This new setting will
override the instruction,

    nodes_and(node_possible_map, node_possible_map, node_online_map);

presently seen in the function arch/powerpc/mm/numa.c:initmem_init().

If the property is not present at boot, no operation will be performed
to define or enable additional nodes.

Signed-off-by: Michael Bringmann <mwb@linux.vnet.ibm.com>
---
 arch/powerpc/mm/numa.c |   47 +++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 47 insertions(+)

Comments

Michael Ellerman Oct. 16, 2017, 12:33 p.m. UTC | #1
Michael Bringmann <mwb@linux.vnet.ibm.com> writes:

> powerpc/nodes: On systems like PowerPC which allow 'hot-add' of CPU

This is a powerpc-only patch, so saying "systems like PowerPC" is
confusing. What you should be saying is "On pseries systems".

> or memory resources, it may occur that the new resources are to be
> inserted into nodes that were not used for these resources at bootup.
> In the kernel, any node that is used must be defined and initialized
> at boot.
>
> This patch extracts the value of the lowest domain level (number of
> allocable resources) from the "rtas" device tree property
> "ibm,current-associativity-domains" or the device tree property

What is current associativity domains? I've not heard of it, where is it
documented, and what does it mean.

Why would use the "current" set vs the "max"? I thought the whole point
was to discover the maximum possible set of nodes that could be
hotplugged.

> "ibm,max-associativity-domains" to use as the maximum number of nodes
> to setup as possibly available in the system.  This new setting will
> override the instruction,
>
>     nodes_and(node_possible_map, node_possible_map, node_online_map);
>
> presently seen in the function arch/powerpc/mm/numa.c:initmem_init().
>
> If the property is not present at boot, no operation will be performed
> to define or enable additional nodes.
>
> Signed-off-by: Michael Bringmann <mwb@linux.vnet.ibm.com>
> ---
>  arch/powerpc/mm/numa.c |   47 +++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 47 insertions(+)
>
> diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
> index ec098b3..b385cd0 100644
> --- a/arch/powerpc/mm/numa.c
> +++ b/arch/powerpc/mm/numa.c
> @@ -892,6 +892,51 @@ static void __init setup_node_data(int nid, u64 start_pfn, u64 end_pfn)
>  	NODE_DATA(nid)->node_spanned_pages = spanned_pages;
>  }
>  
> +static void __init node_associativity_setup(void)

This should really be called "find_possible_nodes()" or something more
descriptive.

> +{
> +	struct device_node *rtas;
> +
> +	rtas = of_find_node_by_path("/rtas");
> +	if (rtas) {

If you just short-circuit that return the whole function body can be
deintented, making it significantly more readable.

ie:
+	rtas = of_find_node_by_path("/rtas");
+	if (!rtas)
+		return;

> +		const __be32 *prop;
> +		u32 len, entries, numnodes, i;
> +
> +		prop = of_get_property(rtas,
> +					"ibm,current-associativity-domains", &len);

Please don't use of_get_property() in new code, we have much better
accessors these days, which do better error checking and handle the
endian conversions for you.

In this case you'd use eg:

	u32 entries;
	rc = of_property_read_u32(rtas, "ibm,current-associativity-domains", &entries);

> +		if (!prop || len < sizeof(unsigned int)) {
> +			prop = of_get_property(rtas,
> +					"ibm,max-associativity-domains", &len);
> +			goto endit;
> +		}
> +
> +		entries = of_read_number(prop++, 1);
> +
> +		if (len < (entries * sizeof(unsigned int)))
> +			goto endit;
> +
> +		if ((0 <= min_common_depth) && (min_common_depth <= (entries-1)))
> +			entries = min_common_depth;
> +		else
> +			entries -= 1;
			^
                        You can't just guess that will be the right entry.

If min_common_depth is < 0 the function should have just returned
immediately at the top.

If min_common_depth is outside the range of the property that's a buggy
device tree, you should print a warning and return.

> +		numnodes = of_read_number(&prop[entries], 1);

	u32 num_nodes;
	rc = of_property_read_u32_index(rtas, "ibm,current-associativity-domains", min_common_depth, &num_nodes);
> +
> +		printk(KERN_INFO "numa: Nodes = %d (mcd = %d)\n", numnodes,
> +			min_common_depth);
> +
> +		for (i = 0; i < numnodes; i++) {
> +			if (!node_possible(i)) {
> +				setup_node_data(i, 0, 0);

Do we actually need to setup the NODE_DATA() yet? Doing it now ensures
it will not be allocated node local, which sucks.

> +				node_set(i, node_possible_map);
> +			}
> +		}
> +	}
> +
> +endit:

"out" would be the normal name.

> +	if (rtas)
> +		of_node_put(rtas);
> +}
> +
>  void __init initmem_init(void)
>  {
>  	int nid, cpu;
> @@ -911,6 +956,8 @@ void __init initmem_init(void)
>  	 */

You need to update the comment above here which is contradicted by the
new function you're adding.

>  	nodes_and(node_possible_map, node_possible_map, node_online_map);
>  
> +	node_associativity_setup();
> +
>  	for_each_online_node(nid) {
>  		unsigned long start_pfn, end_pfn;
>  

cheers
Michael Bringmann Oct. 17, 2017, 4:14 p.m. UTC | #2
See below.

On 10/16/2017 07:33 AM, Michael Ellerman wrote:
> Michael Bringmann <mwb@linux.vnet.ibm.com> writes:
> 
>> powerpc/nodes: On systems like PowerPC which allow 'hot-add' of CPU
> 
> This is a powerpc-only patch, so saying "systems like PowerPC" is
> confusing. What you should be saying is "On pseries systems".
> 
>> or memory resources, it may occur that the new resources are to be
>> inserted into nodes that were not used for these resources at bootup.
>> In the kernel, any node that is used must be defined and initialized
>> at boot.
>>
>> This patch extracts the value of the lowest domain level (number of
>> allocable resources) from the "rtas" device tree property
>> "ibm,current-associativity-domains" or the device tree property
> 
> What is current associativity domains? I've not heard of it, where is it
> documented, and what does it mean.
> 
> Why would use the "current" set vs the "max"? I thought the whole point
> was to discover the maximum possible set of nodes that could be
> hotplugged.
> 
>> "ibm,max-associativity-domains" to use as the maximum number of nodes
>> to setup as possibly available in the system.  This new setting will
>> override the instruction,
>>
>>     nodes_and(node_possible_map, node_possible_map, node_online_map);
>>
>> presently seen in the function arch/powerpc/mm/numa.c:initmem_init().
>>
>> If the property is not present at boot, no operation will be performed
>> to define or enable additional nodes.
>>
>> Signed-off-by: Michael Bringmann <mwb@linux.vnet.ibm.com>
>> ---
>>  arch/powerpc/mm/numa.c |   47 +++++++++++++++++++++++++++++++++++++++++++++++
>>  1 file changed, 47 insertions(+)
>>
>> diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
>> index ec098b3..b385cd0 100644
>> --- a/arch/powerpc/mm/numa.c
>> +++ b/arch/powerpc/mm/numa.c
>> @@ -892,6 +892,51 @@ static void __init setup_node_data(int nid, u64 start_pfn, u64 end_pfn)
>>  	NODE_DATA(nid)->node_spanned_pages = spanned_pages;
>>  }
>>  
>> +static void __init node_associativity_setup(void)
> 
> This should really be called "find_possible_nodes()" or something more
> descriptive.

Okay.
> 
>> +{
>> +	struct device_node *rtas;
>> +
>> +	rtas = of_find_node_by_path("/rtas");
>> +	if (rtas) {
> 
> If you just short-circuit that return the whole function body can be
> deintented, making it significantly more readable.
> 
> ie:
> +	rtas = of_find_node_by_path("/rtas");
> +	if (!rtas)
> +		return;

Okay.

> 
>> +		const __be32 *prop;
>> +		u32 len, entries, numnodes, i;
>> +
>> +		prop = of_get_property(rtas,
>> +					"ibm,current-associativity-domains", &len);
> 
> Please don't use of_get_property() in new code, we have much better
> accessors these days, which do better error checking and handle the
> endian conversions for you.
> 
> In this case you'd use eg:
> 
> 	u32 entries;
> 	rc = of_property_read_u32(rtas, "ibm,current-associativity-domains", &entries);

The property 'ibm,current-associativity-domains' has the same format as the property
'ibm,max-associativity-domains' i.e. it is an integer array.  The accessor of_property_read_32,
however, expects it to be an integer singleton value.  Instead, it needs:

> 
>> +		if (!prop || len < sizeof(unsigned int)) {
>> +			prop = of_get_property(rtas,
>> +					"ibm,max-associativity-domains", &len);
                        if (!prop || len < sizeof(unsigned int))
>> +				goto endit;
>> +		}
>> +
>> +		entries = of_read_number(prop++, 1);
>> +
>> +		if (len < (entries * sizeof(unsigned int)))
>> +			goto endit;
>> +
>> +		if ((0 <= min_common_depth) && (min_common_depth <= (entries-1)))
>> +			entries = min_common_depth;
>> +		else
>> +			entries -= 1;
> 			^
>                         You can't just guess that will be the right entry.
> 
> If min_common_depth is < 0 the function should have just returned
> immediately at the top.

Okay.

> 
> If min_common_depth is outside the range of the property that's a buggy
> device tree, you should print a warning and return.
> 
>> +		numnodes = of_read_number(&prop[entries], 1);
> 
> 	u32 num_nodes;
> 	rc = of_property_read_u32_index(rtas, "ibm,current-associativity-domains", min_common_depth, &num_nodes);
>> +
>> +		printk(KERN_INFO "numa: Nodes = %d (mcd = %d)\n", numnodes,
>> +			min_common_depth);
>> +
>> +		for (i = 0; i < numnodes; i++) {
>> +			if (!node_possible(i)) {
>> +				setup_node_data(i, 0, 0);
> 
> Do we actually need to setup the NODE_DATA() yet? Doing it now ensures
> it will not be allocated node local, which sucks.

Okay.

> 
>> +				node_set(i, node_possible_map);
>> +			}
>> +		}
>> +	}
>> +
>> +endit:
> 
> "out" would be the normal name.

Okay.

> 
>> +	if (rtas)
>> +		of_node_put(rtas);
>> +}
>> +
>>  void __init initmem_init(void)
>>  {
>>  	int nid, cpu;
>> @@ -911,6 +956,8 @@ void __init initmem_init(void)
>>  	 */
> 
> You need to update the comment above here which is contradicted by the
> new function you're adding.

Okay.

> 
>>  	nodes_and(node_possible_map, node_possible_map, node_online_map);
>>  
>> +	node_associativity_setup();
>> +
>>  	for_each_online_node(nid) {
>>  		unsigned long start_pfn, end_pfn;
>>  
> 
> cheers
> 
>
Nathan Fontenot Oct. 17, 2017, 5:02 p.m. UTC | #3
On 10/17/2017 11:14 AM, Michael Bringmann wrote:
> See below.
> 
> On 10/16/2017 07:33 AM, Michael Ellerman wrote:
>> Michael Bringmann <mwb@linux.vnet.ibm.com> writes:
>>
>>> powerpc/nodes: On systems like PowerPC which allow 'hot-add' of CPU
>>
>> This is a powerpc-only patch, so saying "systems like PowerPC" is
>> confusing. What you should be saying is "On pseries systems".
>>
>>> or memory resources, it may occur that the new resources are to be
>>> inserted into nodes that were not used for these resources at bootup.
>>> In the kernel, any node that is used must be defined and initialized
>>> at boot.
>>>
>>> This patch extracts the value of the lowest domain level (number of
>>> allocable resources) from the "rtas" device tree property
>>> "ibm,current-associativity-domains" or the device tree property
>>
>> What is current associativity domains? I've not heard of it, where is it
>> documented, and what does it mean.
>>
>> Why would use the "current" set vs the "max"? I thought the whole point
>> was to discover the maximum possible set of nodes that could be
>> hotplugged.
>>
>>> "ibm,max-associativity-domains" to use as the maximum number of nodes
>>> to setup as possibly available in the system.  This new setting will
>>> override the instruction,
>>>
>>>     nodes_and(node_possible_map, node_possible_map, node_online_map);
>>>
>>> presently seen in the function arch/powerpc/mm/numa.c:initmem_init().
>>>
>>> If the property is not present at boot, no operation will be performed
>>> to define or enable additional nodes.
>>>
>>> Signed-off-by: Michael Bringmann <mwb@linux.vnet.ibm.com>
>>> ---
>>>  arch/powerpc/mm/numa.c |   47 +++++++++++++++++++++++++++++++++++++++++++++++
>>>  1 file changed, 47 insertions(+)
>>>
>>> diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
>>> index ec098b3..b385cd0 100644
>>> --- a/arch/powerpc/mm/numa.c
>>> +++ b/arch/powerpc/mm/numa.c
>>> @@ -892,6 +892,51 @@ static void __init setup_node_data(int nid, u64 start_pfn, u64 end_pfn)
>>>  	NODE_DATA(nid)->node_spanned_pages = spanned_pages;
>>>  }
>>>  
>>> +static void __init node_associativity_setup(void)
>>
>> This should really be called "find_possible_nodes()" or something more
>> descriptive.
> 
> Okay.
>>
>>> +{
>>> +	struct device_node *rtas;
>>> +
>>> +	rtas = of_find_node_by_path("/rtas");
>>> +	if (rtas) {
>>
>> If you just short-circuit that return the whole function body can be
>> deintented, making it significantly more readable.
>>
>> ie:
>> +	rtas = of_find_node_by_path("/rtas");
>> +	if (!rtas)
>> +		return;
> 
> Okay.
> 
>>
>>> +		const __be32 *prop;
>>> +		u32 len, entries, numnodes, i;
>>> +
>>> +		prop = of_get_property(rtas,
>>> +					"ibm,current-associativity-domains", &len);
>>
>> Please don't use of_get_property() in new code, we have much better
>> accessors these days, which do better error checking and handle the
>> endian conversions for you.
>>
>> In this case you'd use eg:
>>
>> 	u32 entries;
>> 	rc = of_property_read_u32(rtas, "ibm,current-associativity-domains", &entries);
> 
> The property 'ibm,current-associativity-domains' has the same format as the property
> 'ibm,max-associativity-domains' i.e. it is an integer array.  The accessor of_property_read_32,
> however, expects it to be an integer singleton value.  Instead, it needs:

I think for this case where the property is an array of values you could use
of_property_count_elems_of_size() to get the number of elements in the array
and then use of_property_read_u32_array() to read the array.

-Nathan

> 
>>
>>> +		if (!prop || len < sizeof(unsigned int)) {
>>> +			prop = of_get_property(rtas,
>>> +					"ibm,max-associativity-domains", &len);
>                         if (!prop || len < sizeof(unsigned int))
>>> +				goto endit;
>>> +		}
>>> +
>>> +		entries = of_read_number(prop++, 1);
>>> +
>>> +		if (len < (entries * sizeof(unsigned int)))
>>> +			goto endit;
>>> +
>>> +		if ((0 <= min_common_depth) && (min_common_depth <= (entries-1)))
>>> +			entries = min_common_depth;
>>> +		else
>>> +			entries -= 1;
>> 			^
>>                         You can't just guess that will be the right entry.
>>
>> If min_common_depth is < 0 the function should have just returned
>> immediately at the top.
> 
> Okay.
> 
>>
>> If min_common_depth is outside the range of the property that's a buggy
>> device tree, you should print a warning and return.
>>
>>> +		numnodes = of_read_number(&prop[entries], 1);
>>
>> 	u32 num_nodes;
>> 	rc = of_property_read_u32_index(rtas, "ibm,current-associativity-domains", min_common_depth, &num_nodes);
>>> +
>>> +		printk(KERN_INFO "numa: Nodes = %d (mcd = %d)\n", numnodes,
>>> +			min_common_depth);
>>> +
>>> +		for (i = 0; i < numnodes; i++) {
>>> +			if (!node_possible(i)) {
>>> +				setup_node_data(i, 0, 0);
>>
>> Do we actually need to setup the NODE_DATA() yet? Doing it now ensures
>> it will not be allocated node local, which sucks.
> 
> Okay.
> 
>>
>>> +				node_set(i, node_possible_map);
>>> +			}
>>> +		}
>>> +	}
>>> +
>>> +endit:
>>
>> "out" would be the normal name.
> 
> Okay.
> 
>>
>>> +	if (rtas)
>>> +		of_node_put(rtas);
>>> +}
>>> +
>>>  void __init initmem_init(void)
>>>  {
>>>  	int nid, cpu;
>>> @@ -911,6 +956,8 @@ void __init initmem_init(void)
>>>  	 */
>>
>> You need to update the comment above here which is contradicted by the
>> new function you're adding.
> 
> Okay.
> 
>>
>>>  	nodes_and(node_possible_map, node_possible_map, node_online_map);
>>>  
>>> +	node_associativity_setup();
>>> +
>>>  	for_each_online_node(nid) {
>>>  		unsigned long start_pfn, end_pfn;
>>>  
>>
>> cheers
>>
>>
>
Michael Bringmann Oct. 17, 2017, 5:22 p.m. UTC | #4
On 10/17/2017 12:02 PM, Nathan Fontenot wrote:
> On 10/17/2017 11:14 AM, Michael Bringmann wrote:
>> See below.
>>
>> On 10/16/2017 07:33 AM, Michael Ellerman wrote:
>>> Michael Bringmann <mwb@linux.vnet.ibm.com> writes:
>>>
>>>> powerpc/nodes: On systems like PowerPC which allow 'hot-add' of CPU
>>>
>>> This is a powerpc-only patch, so saying "systems like PowerPC" is
>>> confusing. What you should be saying is "On pseries systems".
>>>
>>>> or memory resources, it may occur that the new resources are to be
>>>> inserted into nodes that were not used for these resources at bootup.
>>>> In the kernel, any node that is used must be defined and initialized
>>>> at boot.
>>>>
>>>> This patch extracts the value of the lowest domain level (number of
>>>> allocable resources) from the "rtas" device tree property
>>>> "ibm,current-associativity-domains" or the device tree property
>>>
>>> What is current associativity domains? I've not heard of it, where is it
>>> documented, and what does it mean.
>>>
>>> Why would use the "current" set vs the "max"? I thought the whole point
>>> was to discover the maximum possible set of nodes that could be
>>> hotplugged.
>>>
>>>> "ibm,max-associativity-domains" to use as the maximum number of nodes
>>>> to setup as possibly available in the system.  This new setting will
>>>> override the instruction,
>>>>
>>>>     nodes_and(node_possible_map, node_possible_map, node_online_map);
>>>>
>>>> presently seen in the function arch/powerpc/mm/numa.c:initmem_init().
>>>>
>>>> If the property is not present at boot, no operation will be performed
>>>> to define or enable additional nodes.
>>>>
>>>> Signed-off-by: Michael Bringmann <mwb@linux.vnet.ibm.com>
>>>> ---
>>>>  arch/powerpc/mm/numa.c |   47 +++++++++++++++++++++++++++++++++++++++++++++++
>>>>  1 file changed, 47 insertions(+)
>>>>
>>>> diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
>>>> index ec098b3..b385cd0 100644
>>>> --- a/arch/powerpc/mm/numa.c
>>>> +++ b/arch/powerpc/mm/numa.c
>>>> @@ -892,6 +892,51 @@ static void __init setup_node_data(int nid, u64 start_pfn, u64 end_pfn)
>>>>  	NODE_DATA(nid)->node_spanned_pages = spanned_pages;
>>>>  }
>>>>  
>>>> +static void __init node_associativity_setup(void)
>>>
>>> This should really be called "find_possible_nodes()" or something more
>>> descriptive.
>>
>> Okay.
>>>
>>>> +{
>>>> +	struct device_node *rtas;
>>>> +
>>>> +	rtas = of_find_node_by_path("/rtas");
>>>> +	if (rtas) {
>>>
>>> If you just short-circuit that return the whole function body can be
>>> deintented, making it significantly more readable.
>>>
>>> ie:
>>> +	rtas = of_find_node_by_path("/rtas");
>>> +	if (!rtas)
>>> +		return;
>>
>> Okay.
>>
>>>
>>>> +		const __be32 *prop;
>>>> +		u32 len, entries, numnodes, i;
>>>> +
>>>> +		prop = of_get_property(rtas,
>>>> +					"ibm,current-associativity-domains", &len);
>>>
>>> Please don't use of_get_property() in new code, we have much better
>>> accessors these days, which do better error checking and handle the
>>> endian conversions for you.
>>>
>>> In this case you'd use eg:
>>>
>>> 	u32 entries;
>>> 	rc = of_property_read_u32(rtas, "ibm,current-associativity-domains", &entries);
>>
>> The property 'ibm,current-associativity-domains' has the same format as the property
>> 'ibm,max-associativity-domains' i.e. it is an integer array.  The accessor of_property_read_32,
>> however, expects it to be an integer singleton value.  Instead, it needs:
> 
> I think for this case where the property is an array of values you could use
> of_property_count_elems_of_size() to get the number of elements in the array
> and then use of_property_read_u32_array() to read the array.
> 
> -Nathan

We only need one value from the array which is why I am using,

>>>> +		numnodes = of_read_number(&prop[min_common_depth], 1);

With this implementation I do not need to allocate memory for
an array, nor execute code to read all elements of the array.

Michael

> 
>>
>>>
>>>> +		if (!prop || len < sizeof(unsigned int)) {
>>>> +			prop = of_get_property(rtas,
>>>> +					"ibm,max-associativity-domains", &len);
>>                         if (!prop || len < sizeof(unsigned int))
>>>> +				goto endit;
>>>> +		}
>>>> +
>>>> +		entries = of_read_number(prop++, 1);
>>>> +
>>>> +		if (len < (entries * sizeof(unsigned int)))
>>>> +			goto endit;
>>>> +
>>>> +		if ((0 <= min_common_depth) && (min_common_depth <= (entries-1)))
>>>> +			entries = min_common_depth;
>>>> +		else
>>>> +			entries -= 1;
>>> 			^
>>>                         You can't just guess that will be the right entry.
>>>
>>> If min_common_depth is < 0 the function should have just returned
>>> immediately at the top.
>>
>> Okay.
>>
>>>
>>> If min_common_depth is outside the range of the property that's a buggy
>>> device tree, you should print a warning and return.
>>>
>>>> +		numnodes = of_read_number(&prop[entries], 1);
>>>
>>> 	u32 num_nodes;
>>> 	rc = of_property_read_u32_index(rtas, "ibm,current-associativity-domains", min_common_depth, &num_nodes);
>>>> +
>>>> +		printk(KERN_INFO "numa: Nodes = %d (mcd = %d)\n", numnodes,
>>>> +			min_common_depth);
>>>> +
>>>> +		for (i = 0; i < numnodes; i++) {
>>>> +			if (!node_possible(i)) {
>>>> +				setup_node_data(i, 0, 0);
>>>
>>> Do we actually need to setup the NODE_DATA() yet? Doing it now ensures
>>> it will not be allocated node local, which sucks.
>>
>> Okay.
>>
>>>
>>>> +				node_set(i, node_possible_map);
>>>> +			}
>>>> +		}
>>>> +	}
>>>> +
>>>> +endit:
>>>
>>> "out" would be the normal name.
>>
>> Okay.
>>
>>>
>>>> +	if (rtas)
>>>> +		of_node_put(rtas);
>>>> +}
>>>> +
>>>>  void __init initmem_init(void)
>>>>  {
>>>>  	int nid, cpu;
>>>> @@ -911,6 +956,8 @@ void __init initmem_init(void)
>>>>  	 */
>>>
>>> You need to update the comment above here which is contradicted by the
>>> new function you're adding.
>>
>> Okay.
>>
>>>
>>>>  	nodes_and(node_possible_map, node_possible_map, node_online_map);
>>>>  
>>>> +	node_associativity_setup();
>>>> +
>>>>  	for_each_online_node(nid) {
>>>>  		unsigned long start_pfn, end_pfn;
>>>>  
>>>
>>> cheers
>>>
>>>
>>
> 
>
Nathan Fontenot Oct. 17, 2017, 5:41 p.m. UTC | #5
On 10/17/2017 12:22 PM, Michael Bringmann wrote:
> 
> 
> On 10/17/2017 12:02 PM, Nathan Fontenot wrote:
>> On 10/17/2017 11:14 AM, Michael Bringmann wrote:
>>> See below.
>>>
>>> On 10/16/2017 07:33 AM, Michael Ellerman wrote:
>>>> Michael Bringmann <mwb@linux.vnet.ibm.com> writes:
>>>>
>>>>> powerpc/nodes: On systems like PowerPC which allow 'hot-add' of CPU
>>>>
>>>> This is a powerpc-only patch, so saying "systems like PowerPC" is
>>>> confusing. What you should be saying is "On pseries systems".
>>>>
>>>>> or memory resources, it may occur that the new resources are to be
>>>>> inserted into nodes that were not used for these resources at bootup.
>>>>> In the kernel, any node that is used must be defined and initialized
>>>>> at boot.
>>>>>
>>>>> This patch extracts the value of the lowest domain level (number of
>>>>> allocable resources) from the "rtas" device tree property
>>>>> "ibm,current-associativity-domains" or the device tree property
>>>>
>>>> What is current associativity domains? I've not heard of it, where is it
>>>> documented, and what does it mean.
>>>>
>>>> Why would use the "current" set vs the "max"? I thought the whole point
>>>> was to discover the maximum possible set of nodes that could be
>>>> hotplugged.
>>>>
>>>>> "ibm,max-associativity-domains" to use as the maximum number of nodes
>>>>> to setup as possibly available in the system.  This new setting will
>>>>> override the instruction,
>>>>>
>>>>>     nodes_and(node_possible_map, node_possible_map, node_online_map);
>>>>>
>>>>> presently seen in the function arch/powerpc/mm/numa.c:initmem_init().
>>>>>
>>>>> If the property is not present at boot, no operation will be performed
>>>>> to define or enable additional nodes.
>>>>>
>>>>> Signed-off-by: Michael Bringmann <mwb@linux.vnet.ibm.com>
>>>>> ---
>>>>>  arch/powerpc/mm/numa.c |   47 +++++++++++++++++++++++++++++++++++++++++++++++
>>>>>  1 file changed, 47 insertions(+)
>>>>>
>>>>> diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
>>>>> index ec098b3..b385cd0 100644
>>>>> --- a/arch/powerpc/mm/numa.c
>>>>> +++ b/arch/powerpc/mm/numa.c
>>>>> @@ -892,6 +892,51 @@ static void __init setup_node_data(int nid, u64 start_pfn, u64 end_pfn)
>>>>>  	NODE_DATA(nid)->node_spanned_pages = spanned_pages;
>>>>>  }
>>>>>  
>>>>> +static void __init node_associativity_setup(void)
>>>>
>>>> This should really be called "find_possible_nodes()" or something more
>>>> descriptive.
>>>
>>> Okay.
>>>>
>>>>> +{
>>>>> +	struct device_node *rtas;
>>>>> +
>>>>> +	rtas = of_find_node_by_path("/rtas");
>>>>> +	if (rtas) {
>>>>
>>>> If you just short-circuit that return the whole function body can be
>>>> deintented, making it significantly more readable.
>>>>
>>>> ie:
>>>> +	rtas = of_find_node_by_path("/rtas");
>>>> +	if (!rtas)
>>>> +		return;
>>>
>>> Okay.
>>>
>>>>
>>>>> +		const __be32 *prop;
>>>>> +		u32 len, entries, numnodes, i;
>>>>> +
>>>>> +		prop = of_get_property(rtas,
>>>>> +					"ibm,current-associativity-domains", &len);
>>>>
>>>> Please don't use of_get_property() in new code, we have much better
>>>> accessors these days, which do better error checking and handle the
>>>> endian conversions for you.
>>>>
>>>> In this case you'd use eg:
>>>>
>>>> 	u32 entries;
>>>> 	rc = of_property_read_u32(rtas, "ibm,current-associativity-domains", &entries);
>>>
>>> The property 'ibm,current-associativity-domains' has the same format as the property
>>> 'ibm,max-associativity-domains' i.e. it is an integer array.  The accessor of_property_read_32,
>>> however, expects it to be an integer singleton value.  Instead, it needs:
>>
>> I think for this case where the property is an array of values you could use
>> of_property_count_elems_of_size() to get the number of elements in the array
>> and then use of_property_read_u32_array() to read the array.
>>
>> -Nathan
> 
> We only need one value from the array which is why I am using,
> 
>>>>> +		numnodes = of_read_number(&prop[min_common_depth], 1);
> 
> With this implementation I do not need to allocate memory for
> an array, nor execute code to read all elements of the array.
>
> Michael

OK, I didn't see that you just needed a single value from the array.

In this case you could do

	of_property_read_u32_index(rtas, "ibm,current-associativity-domains",
				   min_common_depth, &numnodes);

-Nathan

> 
>>
>>>
>>>>
>>>>> +		if (!prop || len < sizeof(unsigned int)) {
>>>>> +			prop = of_get_property(rtas,
>>>>> +					"ibm,max-associativity-domains", &len);
>>>                         if (!prop || len < sizeof(unsigned int))
>>>>> +				goto endit;
>>>>> +		}
>>>>> +
>>>>> +		entries = of_read_number(prop++, 1);
>>>>> +
>>>>> +		if (len < (entries * sizeof(unsigned int)))
>>>>> +			goto endit;
>>>>> +
>>>>> +		if ((0 <= min_common_depth) && (min_common_depth <= (entries-1)))
>>>>> +			entries = min_common_depth;
>>>>> +		else
>>>>> +			entries -= 1;
>>>> 			^
>>>>                         You can't just guess that will be the right entry.
>>>>
>>>> If min_common_depth is < 0 the function should have just returned
>>>> immediately at the top.
>>>
>>> Okay.
>>>
>>>>
>>>> If min_common_depth is outside the range of the property that's a buggy
>>>> device tree, you should print a warning and return.
>>>>
>>>>> +		numnodes = of_read_number(&prop[entries], 1);
>>>>
>>>> 	u32 num_nodes;
>>>> 	rc = of_property_read_u32_index(rtas, "ibm,current-associativity-domains", min_common_depth, &num_nodes);
>>>>> +
>>>>> +		printk(KERN_INFO "numa: Nodes = %d (mcd = %d)\n", numnodes,
>>>>> +			min_common_depth);
>>>>> +
>>>>> +		for (i = 0; i < numnodes; i++) {
>>>>> +			if (!node_possible(i)) {
>>>>> +				setup_node_data(i, 0, 0);
>>>>
>>>> Do we actually need to setup the NODE_DATA() yet? Doing it now ensures
>>>> it will not be allocated node local, which sucks.
>>>
>>> Okay.
>>>
>>>>
>>>>> +				node_set(i, node_possible_map);
>>>>> +			}
>>>>> +		}
>>>>> +	}
>>>>> +
>>>>> +endit:
>>>>
>>>> "out" would be the normal name.
>>>
>>> Okay.
>>>
>>>>
>>>>> +	if (rtas)
>>>>> +		of_node_put(rtas);
>>>>> +}
>>>>> +
>>>>>  void __init initmem_init(void)
>>>>>  {
>>>>>  	int nid, cpu;
>>>>> @@ -911,6 +956,8 @@ void __init initmem_init(void)
>>>>>  	 */
>>>>
>>>> You need to update the comment above here which is contradicted by the
>>>> new function you're adding.
>>>
>>> Okay.
>>>
>>>>
>>>>>  	nodes_and(node_possible_map, node_possible_map, node_online_map);
>>>>>  
>>>>> +	node_associativity_setup();
>>>>> +
>>>>>  	for_each_online_node(nid) {
>>>>>  		unsigned long start_pfn, end_pfn;
>>>>>  
>>>>
>>>> cheers
>>>>
>>>>
>>>
>>
>>
>
diff mbox series

Patch

diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index ec098b3..b385cd0 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -892,6 +892,51 @@  static void __init setup_node_data(int nid, u64 start_pfn, u64 end_pfn)
 	NODE_DATA(nid)->node_spanned_pages = spanned_pages;
 }
 
+static void __init node_associativity_setup(void)
+{
+	struct device_node *rtas;
+
+	rtas = of_find_node_by_path("/rtas");
+	if (rtas) {
+		const __be32 *prop;
+		u32 len, entries, numnodes, i;
+
+		prop = of_get_property(rtas,
+					"ibm,current-associativity-domains", &len);
+		if (!prop || len < sizeof(unsigned int)) {
+			prop = of_get_property(rtas,
+					"ibm,max-associativity-domains", &len);
+			goto endit;
+		}
+
+		entries = of_read_number(prop++, 1);
+
+		if (len < (entries * sizeof(unsigned int)))
+			goto endit;
+
+		if ((0 <= min_common_depth) && (min_common_depth <= (entries-1)))
+			entries = min_common_depth;
+		else
+			entries -= 1;
+
+		numnodes = of_read_number(&prop[entries], 1);
+
+		printk(KERN_INFO "numa: Nodes = %d (mcd = %d)\n", numnodes,
+			min_common_depth);
+
+		for (i = 0; i < numnodes; i++) {
+			if (!node_possible(i)) {
+				setup_node_data(i, 0, 0);
+				node_set(i, node_possible_map);
+			}
+		}
+	}
+
+endit:
+	if (rtas)
+		of_node_put(rtas);
+}
+
 void __init initmem_init(void)
 {
 	int nid, cpu;
@@ -911,6 +956,8 @@  void __init initmem_init(void)
 	 */
 	nodes_and(node_possible_map, node_possible_map, node_online_map);
 
+	node_associativity_setup();
+
 	for_each_online_node(nid) {
 		unsigned long start_pfn, end_pfn;