Message ID | 1607057327-29822-4-git-send-email-ego@linux.vnet.ibm.com (mailing list archive) |
---|---|
State | Changes Requested |
Headers | show |
Series | Extend Parsing "ibm, thread-groups" for Shared-L2 information | expand |
Context | Check | Description |
---|---|---|
snowpatch_ozlabs/apply_patch | success | Successfully applied on branch powerpc/merge (a1aeabd25a36d9e019381278e543e2d538dd44a7) |
snowpatch_ozlabs/build-ppc64le | warning | Build succeeded but added 2 new sparse warnings |
snowpatch_ozlabs/build-ppc64be | warning | Build succeeded but added 2 new sparse warnings |
snowpatch_ozlabs/build-ppc64e | warning | Build succeeded but added 2 new sparse warnings |
snowpatch_ozlabs/build-pmac32 | fail | Build failed! |
snowpatch_ozlabs/checkpatch | success | total: 0 errors, 0 warnings, 0 checks, 22 lines checked |
snowpatch_ozlabs/needsstable | success | Patch has no Fixes tags |
* Gautham R. Shenoy <ego@linux.vnet.ibm.com> [2020-12-04 10:18:47]: > From: "Gautham R. Shenoy" <ego@linux.vnet.ibm.com> > > > Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> > --- > > +extern bool thread_group_shares_l2; > /* > * On big-core systems, each core has two groups of CPUs each of which > * has its own L1-cache. The thread-siblings which share l1-cache with > * @cpu can be obtained via cpu_smallcore_mask(). > + * > + * On some big-core systems, the L2 cache is shared only between some > + * groups of siblings. This is already parsed and encoded in > + * cpu_l2_cache_mask(). > */ > static const struct cpumask *get_big_core_shared_cpu_map(int cpu, struct cache *cache) > { > if (cache->level == 1) > return cpu_smallcore_mask(cpu); > + if (cache->level == 2 && thread_group_shares_l2) > + return cpu_l2_cache_mask(cpu); > > return &cache->shared_cpu_map; As pointed with lkp@intel.org, we need to do this only with #CONFIG_SMP, even for cache->level = 1 too. I agree that we are displaying shared_cpu_map correctly. Should we have also update /clear shared_cpu_map in the first place. For example:- If for a P9 core with CPUs 0-7, the cache->shared_cpu_map for L1 would have 0-7 but would display 0,2,4,6. The drawback of this is even if cpus 0,2,4,6 are released L1 cache will not be released. Is this as expected?
On Mon, Dec 07, 2020 at 06:41:38PM +0530, Srikar Dronamraju wrote: > * Gautham R. Shenoy <ego@linux.vnet.ibm.com> [2020-12-04 10:18:47]: > > > From: "Gautham R. Shenoy" <ego@linux.vnet.ibm.com> > > > > > > Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> > > --- > > > > +extern bool thread_group_shares_l2; > > /* > > * On big-core systems, each core has two groups of CPUs each of which > > * has its own L1-cache. The thread-siblings which share l1-cache with > > * @cpu can be obtained via cpu_smallcore_mask(). > > + * > > + * On some big-core systems, the L2 cache is shared only between some > > + * groups of siblings. This is already parsed and encoded in > > + * cpu_l2_cache_mask(). > > */ > > static const struct cpumask *get_big_core_shared_cpu_map(int cpu, struct cache *cache) > > { > > if (cache->level == 1) > > return cpu_smallcore_mask(cpu); > > + if (cache->level == 2 && thread_group_shares_l2) > > + return cpu_l2_cache_mask(cpu); > > > > return &cache->shared_cpu_map; > > As pointed with lkp@intel.org, we need to do this only with #CONFIG_SMP, > even for cache->level = 1 too. Yes, I have fixed that in the next version. > > I agree that we are displaying shared_cpu_map correctly. Should we have also > update /clear shared_cpu_map in the first place. For example:- If for a P9 > core with CPUs 0-7, the cache->shared_cpu_map for L1 would have 0-7 but > would display 0,2,4,6. > > The drawback of this is even if cpus 0,2,4,6 are released L1 cache will not > be released. Is this as expected? cacheinfo populates the cache->shared_cpu_map on the basis of which CPUs share the common device-tree node for a particular cache. There is one l1-cache object in the device-tree for a CPU node corresponding to a big-core. That the L1 is further split between the threads of the core is shown using ibm,thread-groups. The ideal thing would be to add a "group_leader" field to "struct cache" so that we can create separate cache objects , one per thread group. I will take a stab at this in the v2. Thanks for the review comments. > > > -- > Thanks and Regards > Srikar Dronamraju
* Gautham R Shenoy <ego@linux.vnet.ibm.com> [2020-12-08 23:26:47]: > > The drawback of this is even if cpus 0,2,4,6 are released L1 cache will not > > be released. Is this as expected? > > cacheinfo populates the cache->shared_cpu_map on the basis of which > CPUs share the common device-tree node for a particular cache. There > is one l1-cache object in the device-tree for a CPU node corresponding > to a big-core. That the L1 is further split between the threads of the > core is shown using ibm,thread-groups. > Yes. > The ideal thing would be to add a "group_leader" field to "struct > cache" so that we can create separate cache objects , one per thread > group. I will take a stab at this in the v2. > I am not saying this needs to be done immediately. We could add a TODO and get it done later. Your patch is not making it worse. Its just that there is still something more left to be done.
On Wed, Dec 09, 2020 at 02:09:21PM +0530, Srikar Dronamraju wrote: > * Gautham R Shenoy <ego@linux.vnet.ibm.com> [2020-12-08 23:26:47]: > > > > The drawback of this is even if cpus 0,2,4,6 are released L1 cache will not > > > be released. Is this as expected? > > > > cacheinfo populates the cache->shared_cpu_map on the basis of which > > CPUs share the common device-tree node for a particular cache. There > > is one l1-cache object in the device-tree for a CPU node corresponding > > to a big-core. That the L1 is further split between the threads of the > > core is shown using ibm,thread-groups. > > > > Yes. > > > The ideal thing would be to add a "group_leader" field to "struct > > cache" so that we can create separate cache objects , one per thread > > group. I will take a stab at this in the v2. > > > > I am not saying this needs to be done immediately. We could add a TODO and > get it done later. Your patch is not making it worse. Its just that there is > still something more left to be done. Yeah, it needs to be fixed but it may not be a 5.11 target. For now I will fix this patch to take care of the build errors on !PPC64 !SMT configs. I will post a separate series for making cacheinfo.c aware of thread-groups at the time of construction of the cache-chain. > > -- > Thanks and Regards > Srikar Dronamraju
diff --git a/arch/powerpc/kernel/cacheinfo.c b/arch/powerpc/kernel/cacheinfo.c index 65ab9fc..1cc8f37 100644 --- a/arch/powerpc/kernel/cacheinfo.c +++ b/arch/powerpc/kernel/cacheinfo.c @@ -651,15 +651,22 @@ static unsigned int index_dir_to_cpu(struct cache_index_dir *index) return dev->id; } +extern bool thread_group_shares_l2; /* * On big-core systems, each core has two groups of CPUs each of which * has its own L1-cache. The thread-siblings which share l1-cache with * @cpu can be obtained via cpu_smallcore_mask(). + * + * On some big-core systems, the L2 cache is shared only between some + * groups of siblings. This is already parsed and encoded in + * cpu_l2_cache_mask(). */ static const struct cpumask *get_big_core_shared_cpu_map(int cpu, struct cache *cache) { if (cache->level == 1) return cpu_smallcore_mask(cpu); + if (cache->level == 2 && thread_group_shares_l2) + return cpu_l2_cache_mask(cpu); return &cache->shared_cpu_map; }