diff mbox series

[3/3] powerpc/cacheinfo: Print correct cache-sibling map/list for L2 cache

Message ID 1607057327-29822-4-git-send-email-ego@linux.vnet.ibm.com (mailing list archive)
State Changes Requested
Headers show
Series Extend Parsing "ibm, thread-groups" for Shared-L2 information | expand

Checks

Context Check Description
snowpatch_ozlabs/apply_patch success Successfully applied on branch powerpc/merge (a1aeabd25a36d9e019381278e543e2d538dd44a7)
snowpatch_ozlabs/build-ppc64le warning Build succeeded but added 2 new sparse warnings
snowpatch_ozlabs/build-ppc64be warning Build succeeded but added 2 new sparse warnings
snowpatch_ozlabs/build-ppc64e warning Build succeeded but added 2 new sparse warnings
snowpatch_ozlabs/build-pmac32 fail Build failed!
snowpatch_ozlabs/checkpatch success total: 0 errors, 0 warnings, 0 checks, 22 lines checked
snowpatch_ozlabs/needsstable success Patch has no Fixes tags

Commit Message

Gautham R Shenoy Dec. 4, 2020, 4:48 a.m. UTC
From: "Gautham R. Shenoy" <ego@linux.vnet.ibm.com>

On POWER platforms where only some groups of threads within a core
share the L2-cache (indicated by the ibm,thread-groups device-tree
property), we currently print the incorrect shared_cpu_map/list for
L2-cache in the sysfs.

This patch reports the correct shared_cpu_map/list on such platforms.

Example:
On a platform with "ibm,thread-groups" set to
                 00000001 00000002 00000004 00000000
                 00000002 00000004 00000006 00000001
                 00000003 00000005 00000007 00000002
                 00000002 00000004 00000000 00000002
                 00000004 00000006 00000001 00000003
                 00000005 00000007

This indicates that threads {0,2,4,6} in the core share the L2-cache
and threads {1,3,5,7} in the core share the L2 cache.

However, without the patch, the shared_cpu_map/list for L2 for CPUs 0,
1 is reported in the sysfs as follows:

/sys/devices/system/cpu/cpu0/cache/index2/shared_cpu_list:0-7
/sys/devices/system/cpu/cpu0/cache/index2/shared_cpu_map:000000,000000ff

/sys/devices/system/cpu/cpu1/cache/index2/shared_cpu_list:0-7
/sys/devices/system/cpu/cpu1/cache/index2/shared_cpu_map:000000,000000ff

With the patch, the shared_cpu_map/list for L2 cache for CPUs 0, 1 is
correctly reported as follows:

/sys/devices/system/cpu/cpu0/cache/index2/shared_cpu_list:0,2,4,6
/sys/devices/system/cpu/cpu0/cache/index2/shared_cpu_map:000000,00000055

/sys/devices/system/cpu/cpu1/cache/index2/shared_cpu_list:1,3,5,7
/sys/devices/system/cpu/cpu1/cache/index2/shared_cpu_map:000000,000000aa

Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
---
 arch/powerpc/kernel/cacheinfo.c | 7 +++++++
 1 file changed, 7 insertions(+)

Comments

Srikar Dronamraju Dec. 7, 2020, 1:11 p.m. UTC | #1
* Gautham R. Shenoy <ego@linux.vnet.ibm.com> [2020-12-04 10:18:47]:

> From: "Gautham R. Shenoy" <ego@linux.vnet.ibm.com>
> 
> 
> Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
> ---
> 
> +extern bool thread_group_shares_l2;
>  /*
>   * On big-core systems, each core has two groups of CPUs each of which
>   * has its own L1-cache. The thread-siblings which share l1-cache with
>   * @cpu can be obtained via cpu_smallcore_mask().
> + *
> + * On some big-core systems, the L2 cache is shared only between some
> + * groups of siblings. This is already parsed and encoded in
> + * cpu_l2_cache_mask().
>   */
>  static const struct cpumask *get_big_core_shared_cpu_map(int cpu, struct cache *cache)
>  {
>  	if (cache->level == 1)
>  		return cpu_smallcore_mask(cpu);
> +	if (cache->level == 2 && thread_group_shares_l2)
> +		return cpu_l2_cache_mask(cpu);
> 
>  	return &cache->shared_cpu_map;

As pointed with lkp@intel.org, we need to do this only with #CONFIG_SMP,
even for cache->level = 1 too.

I agree that we are displaying shared_cpu_map correctly. Should we have also
update /clear shared_cpu_map in the first place. For example:- If for a P9
core with CPUs 0-7, the cache->shared_cpu_map for L1 would have 0-7 but
would display 0,2,4,6.

The drawback of this is even if cpus 0,2,4,6 are released L1 cache will not
be released. Is this as expected?
Gautham R Shenoy Dec. 8, 2020, 5:56 p.m. UTC | #2
On Mon, Dec 07, 2020 at 06:41:38PM +0530, Srikar Dronamraju wrote:
> * Gautham R. Shenoy <ego@linux.vnet.ibm.com> [2020-12-04 10:18:47]:
> 
> > From: "Gautham R. Shenoy" <ego@linux.vnet.ibm.com>
> > 
> > 
> > Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
> > ---
> > 
> > +extern bool thread_group_shares_l2;
> >  /*
> >   * On big-core systems, each core has two groups of CPUs each of which
> >   * has its own L1-cache. The thread-siblings which share l1-cache with
> >   * @cpu can be obtained via cpu_smallcore_mask().
> > + *
> > + * On some big-core systems, the L2 cache is shared only between some
> > + * groups of siblings. This is already parsed and encoded in
> > + * cpu_l2_cache_mask().
> >   */
> >  static const struct cpumask *get_big_core_shared_cpu_map(int cpu, struct cache *cache)
> >  {
> >  	if (cache->level == 1)
> >  		return cpu_smallcore_mask(cpu);
> > +	if (cache->level == 2 && thread_group_shares_l2)
> > +		return cpu_l2_cache_mask(cpu);
> > 
> >  	return &cache->shared_cpu_map;
> 
> As pointed with lkp@intel.org, we need to do this only with #CONFIG_SMP,
> even for cache->level = 1 too.

Yes, I have fixed that in the next version.

> 
> I agree that we are displaying shared_cpu_map correctly. Should we have also
> update /clear shared_cpu_map in the first place. For example:- If for a P9
> core with CPUs 0-7, the cache->shared_cpu_map for L1 would have 0-7 but
> would display 0,2,4,6.
> 
> The drawback of this is even if cpus 0,2,4,6 are released L1 cache will not
> be released. Is this as expected?

cacheinfo populates the cache->shared_cpu_map on the basis of which
CPUs share the common device-tree node for a particular cache.  There
is one l1-cache object in the device-tree for a CPU node corresponding
to a big-core. That the L1 is further split between the threads of the
core is shown using ibm,thread-groups.

The ideal thing would be to add a "group_leader" field to "struct
cache" so that we can create separate cache objects , one per thread
group. I will take a stab at this in the v2.

Thanks for the review comments.



> 
> 
> -- 
> Thanks and Regards
> Srikar Dronamraju
Srikar Dronamraju Dec. 9, 2020, 8:39 a.m. UTC | #3
* Gautham R Shenoy <ego@linux.vnet.ibm.com> [2020-12-08 23:26:47]:

> > The drawback of this is even if cpus 0,2,4,6 are released L1 cache will not
> > be released. Is this as expected?
> 
> cacheinfo populates the cache->shared_cpu_map on the basis of which
> CPUs share the common device-tree node for a particular cache.  There
> is one l1-cache object in the device-tree for a CPU node corresponding
> to a big-core. That the L1 is further split between the threads of the
> core is shown using ibm,thread-groups.
> 

Yes.

> The ideal thing would be to add a "group_leader" field to "struct
> cache" so that we can create separate cache objects , one per thread
> group. I will take a stab at this in the v2.
> 

I am not saying this needs to be done immediately. We could add a TODO and
get it done later. Your patch is not making it worse. Its just that there is
still something more left to be done.
Gautham R Shenoy Dec. 9, 2020, 9:07 a.m. UTC | #4
On Wed, Dec 09, 2020 at 02:09:21PM +0530, Srikar Dronamraju wrote:
> * Gautham R Shenoy <ego@linux.vnet.ibm.com> [2020-12-08 23:26:47]:
> 
> > > The drawback of this is even if cpus 0,2,4,6 are released L1 cache will not
> > > be released. Is this as expected?
> > 
> > cacheinfo populates the cache->shared_cpu_map on the basis of which
> > CPUs share the common device-tree node for a particular cache.  There
> > is one l1-cache object in the device-tree for a CPU node corresponding
> > to a big-core. That the L1 is further split between the threads of the
> > core is shown using ibm,thread-groups.
> > 
> 
> Yes.
> 
> > The ideal thing would be to add a "group_leader" field to "struct
> > cache" so that we can create separate cache objects , one per thread
> > group. I will take a stab at this in the v2.
> > 
> 
> I am not saying this needs to be done immediately. We could add a TODO and
> get it done later. Your patch is not making it worse. Its just that there is
> still something more left to be done.

Yeah, it needs to be fixed but it may not be a 5.11 target. For now I
will fix this patch to take care of the build errors on !PPC64 !SMT
configs. I will post a separate series for making cacheinfo.c aware of
thread-groups at the time of construction of the cache-chain.

> 
> -- 
> Thanks and Regards
> Srikar Dronamraju
diff mbox series

Patch

diff --git a/arch/powerpc/kernel/cacheinfo.c b/arch/powerpc/kernel/cacheinfo.c
index 65ab9fc..1cc8f37 100644
--- a/arch/powerpc/kernel/cacheinfo.c
+++ b/arch/powerpc/kernel/cacheinfo.c
@@ -651,15 +651,22 @@  static unsigned int index_dir_to_cpu(struct cache_index_dir *index)
 	return dev->id;
 }
 
+extern bool thread_group_shares_l2;
 /*
  * On big-core systems, each core has two groups of CPUs each of which
  * has its own L1-cache. The thread-siblings which share l1-cache with
  * @cpu can be obtained via cpu_smallcore_mask().
+ *
+ * On some big-core systems, the L2 cache is shared only between some
+ * groups of siblings. This is already parsed and encoded in
+ * cpu_l2_cache_mask().
  */
 static const struct cpumask *get_big_core_shared_cpu_map(int cpu, struct cache *cache)
 {
 	if (cache->level == 1)
 		return cpu_smallcore_mask(cpu);
+	if (cache->level == 2 && thread_group_shares_l2)
+		return cpu_l2_cache_mask(cpu);
 
 	return &cache->shared_cpu_map;
 }