diff mbox series

[v3,1/3] lib: Restrict cpumask_local_spread to houskeeping CPUs

Message ID 20200623192331.215557-2-nitesh@redhat.com
State New
Headers show
Series Preventing job distribution to isolated CPUs | expand

Commit Message

Nitesh Narayan Lal June 23, 2020, 7:23 p.m. UTC
From: Alex Belits <abelits@marvell.com>

The current implementation of cpumask_local_spread() does not respect the
isolated CPUs, i.e., even if a CPU has been isolated for Real-Time task,
it will return it to the caller for pinning of its IRQ threads. Having
these unwanted IRQ threads on an isolated CPU adds up to a latency
overhead.

Restrict the CPUs that are returned for spreading IRQs only to the
available housekeeping CPUs.

Signed-off-by: Alex Belits <abelits@marvell.com>
Signed-off-by: Nitesh Narayan Lal <nitesh@redhat.com>
---
 lib/cpumask.c | 16 +++++++++++-----
 1 file changed, 11 insertions(+), 5 deletions(-)

Comments

Frederic Weisbecker June 24, 2020, 12:13 p.m. UTC | #1
On Tue, Jun 23, 2020 at 03:23:29PM -0400, Nitesh Narayan Lal wrote:
> From: Alex Belits <abelits@marvell.com>
> 
> The current implementation of cpumask_local_spread() does not respect the
> isolated CPUs, i.e., even if a CPU has been isolated for Real-Time task,
> it will return it to the caller for pinning of its IRQ threads. Having
> these unwanted IRQ threads on an isolated CPU adds up to a latency
> overhead.
> 
> Restrict the CPUs that are returned for spreading IRQs only to the
> available housekeeping CPUs.
> 
> Signed-off-by: Alex Belits <abelits@marvell.com>
> Signed-off-by: Nitesh Narayan Lal <nitesh@redhat.com>
> ---
>  lib/cpumask.c | 16 +++++++++++-----
>  1 file changed, 11 insertions(+), 5 deletions(-)
> 
> diff --git a/lib/cpumask.c b/lib/cpumask.c
> index fb22fb266f93..d73104995981 100644
> --- a/lib/cpumask.c
> +++ b/lib/cpumask.c
> @@ -6,6 +6,7 @@
>  #include <linux/export.h>
>  #include <linux/memblock.h>
>  #include <linux/numa.h>
> +#include <linux/sched/isolation.h>
>  
>  /**
>   * cpumask_next - get the next cpu in a cpumask
> @@ -205,22 +206,27 @@ void __init free_bootmem_cpumask_var(cpumask_var_t mask)
>   */
>  unsigned int cpumask_local_spread(unsigned int i, int node)
>  {
> -	int cpu;
> +	int cpu, hk_flags;
> +	const struct cpumask *mask;
>  
> +	hk_flags = HK_FLAG_DOMAIN | HK_FLAG_WQ;

This should be HK_FLAG_MANAGED_IRQ instead of HK_FLAG_WQ since this
function seem to be used mostly to select CPUs to affine managed IRQs.
In the end the cpumask you pass to IRQ core will be filtered throughout
HK_FLAG_MANAGED_IRQ anyway so better select an appropriate one in the
first place to avoid an empty cpumask intersection.

Now even if cpumask_local_spread() is currently mostly used to select
managed irq targets, the name and role of the function don't refer to that.
Probably cpumask_local_spread() should take HK_ flag in parameter so that
it can correctly handle future users?

That being said, I plan to merge HK_FLAG_RCU, HK_FLAG_MISC, HK_FLAG_SCHED,
HK_FLAG_WQ and HK_FLAG_TIMER into HK_FLAG_UNBOUND since it doesn't make sense
to divide them all. And the actual flag used inside cpumask_local_spread()
could end up being HK_FLAG_DOMAIN | HK_FLAG_UNBOUND. So probably you don't
need to worry about that and just change the HK_FLAG_WQ in your patch
with HK_FLAG_MANAGED_IRQ.

Thanks.
Andrew Morton June 24, 2020, 7:26 p.m. UTC | #2
On Tue, 23 Jun 2020 15:23:29 -0400 Nitesh Narayan Lal <nitesh@redhat.com> wrote:

> From: Alex Belits <abelits@marvell.com>
> 
> The current implementation of cpumask_local_spread() does not respect the
> isolated CPUs, i.e., even if a CPU has been isolated for Real-Time task,
> it will return it to the caller for pinning of its IRQ threads. Having
> these unwanted IRQ threads on an isolated CPU adds up to a latency
> overhead.
> 
> Restrict the CPUs that are returned for spreading IRQs only to the
> available housekeeping CPUs.
> 
> ...
>
> --- a/lib/cpumask.c
> +++ b/lib/cpumask.c
> @@ -6,6 +6,7 @@
>  #include <linux/export.h>
>  #include <linux/memblock.h>
>  #include <linux/numa.h>
> +#include <linux/sched/isolation.h>
>  
>  /**
>   * cpumask_next - get the next cpu in a cpumask
> @@ -205,22 +206,27 @@ void __init free_bootmem_cpumask_var(cpumask_var_t mask)
>   */
>  unsigned int cpumask_local_spread(unsigned int i, int node)
>  {
> -	int cpu;
> +	int cpu, hk_flags;
> +	const struct cpumask *mask;
>  
> +	hk_flags = HK_FLAG_DOMAIN | HK_FLAG_WQ;
> +	mask = housekeeping_cpumask(hk_flags);
>  	/* Wrap: we always want a cpu. */
> -	i %= num_online_cpus();
> +	i %= cpumask_weight(mask);
>  
>  	if (node == NUMA_NO_NODE) {
> -		for_each_cpu(cpu, cpu_online_mask)
> +		for_each_cpu(cpu, mask) {
>  			if (i-- == 0)
>  				return cpu;
> +		}
>  	} else {
>  		/* NUMA first. */
> -		for_each_cpu_and(cpu, cpumask_of_node(node), cpu_online_mask)
> +		for_each_cpu_and(cpu, cpumask_of_node(node), mask) {
>  			if (i-- == 0)
>  				return cpu;
> +		}
>  
> -		for_each_cpu(cpu, cpu_online_mask) {
> +		for_each_cpu(cpu, mask) {
>  			/* Skip NUMA nodes, done above. */
>  			if (cpumask_test_cpu(cpu, cpumask_of_node(node)))
>  				continue;

Are you aware of these changes to cpu_local_spread()?
https://lore.kernel.org/lkml/1582768688-2314-1-git-send-email-zhangshaokun@hisilicon.com/

I don't see a lot of overlap but it would be nice for you folks to
check each other's homework ;)
Nitesh Narayan Lal June 24, 2020, 8:37 p.m. UTC | #3
On 6/24/20 8:13 AM, Frederic Weisbecker wrote:
> On Tue, Jun 23, 2020 at 03:23:29PM -0400, Nitesh Narayan Lal wrote:
>> From: Alex Belits <abelits@marvell.com>
>>
>> The current implementation of cpumask_local_spread() does not respect the
>> isolated CPUs, i.e., even if a CPU has been isolated for Real-Time task,
>> it will return it to the caller for pinning of its IRQ threads. Having
>> these unwanted IRQ threads on an isolated CPU adds up to a latency
>> overhead.
>>
>> Restrict the CPUs that are returned for spreading IRQs only to the
>> available housekeeping CPUs.
>>
>> Signed-off-by: Alex Belits <abelits@marvell.com>
>> Signed-off-by: Nitesh Narayan Lal <nitesh@redhat.com>
>> ---
>>  lib/cpumask.c | 16 +++++++++++-----
>>  1 file changed, 11 insertions(+), 5 deletions(-)
>>
>> diff --git a/lib/cpumask.c b/lib/cpumask.c
>> index fb22fb266f93..d73104995981 100644
>> --- a/lib/cpumask.c
>> +++ b/lib/cpumask.c
>> @@ -6,6 +6,7 @@
>>  #include <linux/export.h>
>>  #include <linux/memblock.h>
>>  #include <linux/numa.h>
>> +#include <linux/sched/isolation.h>
>>  
>>  /**
>>   * cpumask_next - get the next cpu in a cpumask
>> @@ -205,22 +206,27 @@ void __init free_bootmem_cpumask_var(cpumask_var_t mask)
>>   */
>>  unsigned int cpumask_local_spread(unsigned int i, int node)
>>  {
>> -	int cpu;
>> +	int cpu, hk_flags;
>> +	const struct cpumask *mask;
>>  
>> +	hk_flags = HK_FLAG_DOMAIN | HK_FLAG_WQ;
> This should be HK_FLAG_MANAGED_IRQ instead of HK_FLAG_WQ since this
> function seem to be used mostly to select CPUs to affine managed IRQs.

IIRC then there are drivers such as ixgbe that use cpumask_local_spread while
affining NORMAL IRQs as well.
But I can recheck that.

> In the end the cpumask you pass to IRQ core will be filtered throughout
> HK_FLAG_MANAGED_IRQ anyway so better select an appropriate one in the
> first place to avoid an empty cpumask intersection.
>
> Now even if cpumask_local_spread() is currently mostly used to select
> managed irq targets, the name and role of the function don't refer to that.
> Probably cpumask_local_spread() should take HK_ flag in parameter so that
> it can correctly handle future users?
>
> That being said, I plan to merge HK_FLAG_RCU, HK_FLAG_MISC, HK_FLAG_SCHED,
> HK_FLAG_WQ and HK_FLAG_TIMER into HK_FLAG_UNBOUND since it doesn't make sense
> to divide them all.

That would be nice.

>  And the actual flag used inside cpumask_local_spread()
> could end up being HK_FLAG_DOMAIN | HK_FLAG_UNBOUND. So probably you don't
> need to worry about that and just change the HK_FLAG_WQ in your patch
> with HK_FLAG_MANAGED_IRQ.
>
> Thanks.
>
Nitesh Narayan Lal June 24, 2020, 8:38 p.m. UTC | #4
On 6/24/20 3:26 PM, Andrew Morton wrote:
> On Tue, 23 Jun 2020 15:23:29 -0400 Nitesh Narayan Lal <nitesh@redhat.com> wrote:
>
>> From: Alex Belits <abelits@marvell.com>
>>
>> The current implementation of cpumask_local_spread() does not respect the
>> isolated CPUs, i.e., even if a CPU has been isolated for Real-Time task,
>> it will return it to the caller for pinning of its IRQ threads. Having
>> these unwanted IRQ threads on an isolated CPU adds up to a latency
>> overhead.
>>
>> Restrict the CPUs that are returned for spreading IRQs only to the
>> available housekeeping CPUs.
>>
>> ...
>>
>> --- a/lib/cpumask.c
>> +++ b/lib/cpumask.c
>> @@ -6,6 +6,7 @@
>>  #include <linux/export.h>
>>  #include <linux/memblock.h>
>>  #include <linux/numa.h>
>> +#include <linux/sched/isolation.h>
>>  
>>  /**
>>   * cpumask_next - get the next cpu in a cpumask
>> @@ -205,22 +206,27 @@ void __init free_bootmem_cpumask_var(cpumask_var_t mask)
>>   */
>>  unsigned int cpumask_local_spread(unsigned int i, int node)
>>  {
>> -	int cpu;
>> +	int cpu, hk_flags;
>> +	const struct cpumask *mask;
>>  
>> +	hk_flags = HK_FLAG_DOMAIN | HK_FLAG_WQ;
>> +	mask = housekeeping_cpumask(hk_flags);
>>  	/* Wrap: we always want a cpu. */
>> -	i %= num_online_cpus();
>> +	i %= cpumask_weight(mask);
>>  
>>  	if (node == NUMA_NO_NODE) {
>> -		for_each_cpu(cpu, cpu_online_mask)
>> +		for_each_cpu(cpu, mask) {
>>  			if (i-- == 0)
>>  				return cpu;
>> +		}
>>  	} else {
>>  		/* NUMA first. */
>> -		for_each_cpu_and(cpu, cpumask_of_node(node), cpu_online_mask)
>> +		for_each_cpu_and(cpu, cpumask_of_node(node), mask) {
>>  			if (i-- == 0)
>>  				return cpu;
>> +		}
>>  
>> -		for_each_cpu(cpu, cpu_online_mask) {
>> +		for_each_cpu(cpu, mask) {
>>  			/* Skip NUMA nodes, done above. */
>>  			if (cpumask_test_cpu(cpu, cpumask_of_node(node)))
>>  				continue;
> Are you aware of these changes to cpu_local_spread()?
> https://lore.kernel.org/lkml/1582768688-2314-1-git-send-email-zhangshaokun@hisilicon.com/
>
> I don't see a lot of overlap but it would be nice for you folks to
> check each other's homework ;)

Sure, I will take a look.
Thanks

>
Nitesh Narayan Lal June 24, 2020, 11:31 p.m. UTC | #5
On 6/24/20 3:26 PM, Andrew Morton wrote:
> On Tue, 23 Jun 2020 15:23:29 -0400 Nitesh Narayan Lal <nitesh@redhat.com> wrote:
>
>> From: Alex Belits <abelits@marvell.com>
>>
>> The current implementation of cpumask_local_spread() does not respect the
>> isolated CPUs, i.e., even if a CPU has been isolated for Real-Time task,
>> it will return it to the caller for pinning of its IRQ threads. Having
>> these unwanted IRQ threads on an isolated CPU adds up to a latency
>> overhead.
>>
>> Restrict the CPUs that are returned for spreading IRQs only to the
>> available housekeeping CPUs.
>>
>> ...
>>
>> --- a/lib/cpumask.c
>> +++ b/lib/cpumask.c
>> @@ -6,6 +6,7 @@
>>  #include <linux/export.h>
>>  #include <linux/memblock.h>
>>  #include <linux/numa.h>
>> +#include <linux/sched/isolation.h>
>>  
>>  /**
>>   * cpumask_next - get the next cpu in a cpumask
>> @@ -205,22 +206,27 @@ void __init free_bootmem_cpumask_var(cpumask_var_t mask)
>>   */
>>  unsigned int cpumask_local_spread(unsigned int i, int node)
>>  {
>> -	int cpu;
>> +	int cpu, hk_flags;
>> +	const struct cpumask *mask;
>>  
>> +	hk_flags = HK_FLAG_DOMAIN | HK_FLAG_WQ;
>> +	mask = housekeeping_cpumask(hk_flags);
>>  	/* Wrap: we always want a cpu. */
>> -	i %= num_online_cpus();
>> +	i %= cpumask_weight(mask);
>>  
>>  	if (node == NUMA_NO_NODE) {
>> -		for_each_cpu(cpu, cpu_online_mask)
>> +		for_each_cpu(cpu, mask) {
>>  			if (i-- == 0)
>>  				return cpu;
>> +		}
>>  	} else {
>>  		/* NUMA first. */
>> -		for_each_cpu_and(cpu, cpumask_of_node(node), cpu_online_mask)
>> +		for_each_cpu_and(cpu, cpumask_of_node(node), mask) {
>>  			if (i-- == 0)
>>  				return cpu;
>> +		}
>>  
>> -		for_each_cpu(cpu, cpu_online_mask) {
>> +		for_each_cpu(cpu, mask) {
>>  			/* Skip NUMA nodes, done above. */
>>  			if (cpumask_test_cpu(cpu, cpumask_of_node(node)))
>>  				continue;
> Are you aware of these changes to cpu_local_spread()?
> https://lore.kernel.org/lkml/1582768688-2314-1-git-send-email-zhangshaokun@hisilicon.com/
>
> I don't see a lot of overlap but it would be nice for you folks to
> check each other's homework ;)

I took a look at the patch and as you said there is not much overlap.
The idea of keeping isolated CPUs untouched for RT environments will be valid
for the optimization that Shaokun is suggesting as well.
I am not sure about the current state of the patch-set but I will certainly keep
an eye on it.

>
>
Shaokun Zhang June 29, 2020, 9:01 a.m. UTC | #6
Hi Andrew,

在 2020/6/25 3:26, Andrew Morton 写道:
> On Tue, 23 Jun 2020 15:23:29 -0400 Nitesh Narayan Lal <nitesh@redhat.com> wrote:
> 
>> From: Alex Belits <abelits@marvell.com>
>>
>> The current implementation of cpumask_local_spread() does not respect the
>> isolated CPUs, i.e., even if a CPU has been isolated for Real-Time task,
>> it will return it to the caller for pinning of its IRQ threads. Having
>> these unwanted IRQ threads on an isolated CPU adds up to a latency
>> overhead.
>>
>> Restrict the CPUs that are returned for spreading IRQs only to the
>> available housekeeping CPUs.
>>
>> ...
>>
>> --- a/lib/cpumask.c
>> +++ b/lib/cpumask.c
>> @@ -6,6 +6,7 @@
>>  #include <linux/export.h>
>>  #include <linux/memblock.h>
>>  #include <linux/numa.h>
>> +#include <linux/sched/isolation.h>
>>  
>>  /**
>>   * cpumask_next - get the next cpu in a cpumask
>> @@ -205,22 +206,27 @@ void __init free_bootmem_cpumask_var(cpumask_var_t mask)
>>   */
>>  unsigned int cpumask_local_spread(unsigned int i, int node)
>>  {
>> -	int cpu;
>> +	int cpu, hk_flags;
>> +	const struct cpumask *mask;
>>  
>> +	hk_flags = HK_FLAG_DOMAIN | HK_FLAG_WQ;
>> +	mask = housekeeping_cpumask(hk_flags);
>>  	/* Wrap: we always want a cpu. */
>> -	i %= num_online_cpus();
>> +	i %= cpumask_weight(mask);
>>  
>>  	if (node == NUMA_NO_NODE) {
>> -		for_each_cpu(cpu, cpu_online_mask)
>> +		for_each_cpu(cpu, mask) {
>>  			if (i-- == 0)
>>  				return cpu;
>> +		}
>>  	} else {
>>  		/* NUMA first. */
>> -		for_each_cpu_and(cpu, cpumask_of_node(node), cpu_online_mask)
>> +		for_each_cpu_and(cpu, cpumask_of_node(node), mask) {
>>  			if (i-- == 0)
>>  				return cpu;
>> +		}
>>  
>> -		for_each_cpu(cpu, cpu_online_mask) {
>> +		for_each_cpu(cpu, mask) {
>>  			/* Skip NUMA nodes, done above. */
>>  			if (cpumask_test_cpu(cpu, cpumask_of_node(node)))
>>  				continue;
> 
> Are you aware of these changes to cpu_local_spread()?
> https://lore.kernel.org/lkml/1582768688-2314-1-git-send-email-zhangshaokun@hisilicon.com/
> 
> I don't see a lot of overlap but it would be nice for you folks to

Yeah, it's a different issue from Nitesh. About our's patch, it has been
linux-next long time, will it be merged in Linus's tree?

Thanks,
Shaokun

> check each other's homework ;)
> 
> 
> 
> .
>
diff mbox series

Patch

diff --git a/lib/cpumask.c b/lib/cpumask.c
index fb22fb266f93..d73104995981 100644
--- a/lib/cpumask.c
+++ b/lib/cpumask.c
@@ -6,6 +6,7 @@ 
 #include <linux/export.h>
 #include <linux/memblock.h>
 #include <linux/numa.h>
+#include <linux/sched/isolation.h>
 
 /**
  * cpumask_next - get the next cpu in a cpumask
@@ -205,22 +206,27 @@  void __init free_bootmem_cpumask_var(cpumask_var_t mask)
  */
 unsigned int cpumask_local_spread(unsigned int i, int node)
 {
-	int cpu;
+	int cpu, hk_flags;
+	const struct cpumask *mask;
 
+	hk_flags = HK_FLAG_DOMAIN | HK_FLAG_WQ;
+	mask = housekeeping_cpumask(hk_flags);
 	/* Wrap: we always want a cpu. */
-	i %= num_online_cpus();
+	i %= cpumask_weight(mask);
 
 	if (node == NUMA_NO_NODE) {
-		for_each_cpu(cpu, cpu_online_mask)
+		for_each_cpu(cpu, mask) {
 			if (i-- == 0)
 				return cpu;
+		}
 	} else {
 		/* NUMA first. */
-		for_each_cpu_and(cpu, cpumask_of_node(node), cpu_online_mask)
+		for_each_cpu_and(cpu, cpumask_of_node(node), mask) {
 			if (i-- == 0)
 				return cpu;
+		}
 
-		for_each_cpu(cpu, cpu_online_mask) {
+		for_each_cpu(cpu, mask) {
 			/* Skip NUMA nodes, done above. */
 			if (cpumask_test_cpu(cpu, cpumask_of_node(node)))
 				continue;