diff mbox series

[RFC,v2,4/4] powerpc/papr_scm: Add cpu hotplug support for nvdimm pmu device

Message ID 20210525132216.1239259-5-kjain@linux.ibm.com (mailing list archive)
State Superseded
Headers show
Series Add perf interface to expose nvdimm | expand
Related show

Checks

Context Check Description
snowpatch_ozlabs/apply_patch success Successfully applied on branch powerpc/merge (112f47a1484ddca610b70cbe4a99f0d0f1701daf)
snowpatch_ozlabs/build-ppc64le success Build succeeded
snowpatch_ozlabs/build-ppc64be success Build succeeded
snowpatch_ozlabs/build-ppc64e success Build succeeded
snowpatch_ozlabs/build-pmac32 success Build succeeded
snowpatch_ozlabs/checkpatch success total: 0 errors, 0 warnings, 0 checks, 145 lines checked
snowpatch_ozlabs/needsstable success Patch has no Fixes tags

Commit Message

Kajol Jain May 25, 2021, 1:22 p.m. UTC
Patch here adds cpu hotplug functions to nvdimm pmu.
It adds cpumask to designate a cpu to make HCALL to
collect the counter data for the nvdimm device and
update ABI documentation accordingly.

Result in power9 lpar system:
command:# cat /sys/devices/nmem0/cpumask
0

Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
---
 Documentation/ABI/testing/sysfs-bus-papr-pmem |  6 ++
 arch/powerpc/platforms/pseries/papr_scm.c     | 61 +++++++++++++++++++
 include/linux/nd.h                            | 17 ++++--
 3 files changed, 79 insertions(+), 5 deletions(-)

Comments

Peter Zijlstra May 25, 2021, 2:16 p.m. UTC | #1
On Tue, May 25, 2021 at 06:52:16PM +0530, Kajol Jain wrote:
> Patch here adds cpu hotplug functions to nvdimm pmu.

I'm thinking "Patch here" qualifies for "This patch", see
Documentation/process/submitting-patches.rst .

> It adds cpumask to designate a cpu to make HCALL to
> collect the counter data for the nvdimm device and
> update ABI documentation accordingly.
> 
> Result in power9 lpar system:
> command:# cat /sys/devices/nmem0/cpumask
> 0

Is this specific to the papr thing, or should this be in generic nvdimm
code?
Kajol Jain May 26, 2021, 7:26 a.m. UTC | #2
On 5/25/21 7:46 PM, Peter Zijlstra wrote:
> On Tue, May 25, 2021 at 06:52:16PM +0530, Kajol Jain wrote:
>> Patch here adds cpu hotplug functions to nvdimm pmu.
> 
> I'm thinking "Patch here" qualifies for "This patch", see
> Documentation/process/submitting-patches.rst .
> 
Hi Peter,
   I will reword this commit message.

>> It adds cpumask to designate a cpu to make HCALL to
>> collect the counter data for the nvdimm device and
>> update ABI documentation accordingly.
>>
>> Result in power9 lpar system:
>> command:# cat /sys/devices/nmem0/cpumask
>> 0
> 
> Is this specific to the papr thing, or should this be in generic nvdimm
> code?

This code is not specific to papr device and we can move it to
generic nvdimm interface. But do we need to add some checks on whether
any arch/platform specific driver want that support or we can assume 
that this will be something needed by all platforms?

Thanks,
Kajol Jain
Peter Zijlstra May 26, 2021, 8:32 a.m. UTC | #3
On Wed, May 26, 2021 at 12:56:58PM +0530, kajoljain wrote:
> On 5/25/21 7:46 PM, Peter Zijlstra wrote:
> > On Tue, May 25, 2021 at 06:52:16PM +0530, Kajol Jain wrote:

> >> It adds cpumask to designate a cpu to make HCALL to
> >> collect the counter data for the nvdimm device and
> >> update ABI documentation accordingly.
> >>
> >> Result in power9 lpar system:
> >> command:# cat /sys/devices/nmem0/cpumask
> >> 0
> > 
> > Is this specific to the papr thing, or should this be in generic nvdimm
> > code?
> 
> This code is not specific to papr device and we can move it to
> generic nvdimm interface. But do we need to add some checks on whether
> any arch/platform specific driver want that support or we can assume 
> that this will be something needed by all platforms?

I'm a complete NVDIMM n00b, but to me it would appear they would have to
conform to the normal memory hierarchy and would thus always be
per-node.

Also, if/when deviation from this rule is observed, we can always
rework/extend this. For now I think it would make sense to have the
per-node ness of the thing expressed in the generic layer.
Aneesh Kumar K V May 26, 2021, 8:45 a.m. UTC | #4
On 5/26/21 12:56 PM, kajoljain wrote:
> 
> 
> On 5/25/21 7:46 PM, Peter Zijlstra wrote:
>> On Tue, May 25, 2021 at 06:52:16PM +0530, Kajol Jain wrote:
>>> Patch here adds cpu hotplug functions to nvdimm pmu.
>>
>> I'm thinking "Patch here" qualifies for "This patch", see
>> Documentation/process/submitting-patches.rst .
>>
> Hi Peter,
>     I will reword this commit message.
> 
>>> It adds cpumask to designate a cpu to make HCALL to
>>> collect the counter data for the nvdimm device and
>>> update ABI documentation accordingly.
>>>
>>> Result in power9 lpar system:
>>> command:# cat /sys/devices/nmem0/cpumask
>>> 0
>>
>> Is this specific to the papr thing, or should this be in generic nvdimm
>> code?
> 
> This code is not specific to papr device and we can move it to
> generic nvdimm interface. But do we need to add some checks on whether
> any arch/platform specific driver want that support or we can assume
> that this will be something needed by all platforms?
> 

It says the cpu that should be used to make the hcall. That makes it 
PAPR specific.

-aneesh
Kajol Jain May 26, 2021, 9:08 a.m. UTC | #5
On 5/26/21 2:15 PM, Aneesh Kumar K.V wrote:
> On 5/26/21 12:56 PM, kajoljain wrote:
>>
>>
>> On 5/25/21 7:46 PM, Peter Zijlstra wrote:
>>> On Tue, May 25, 2021 at 06:52:16PM +0530, Kajol Jain wrote:
>>>> Patch here adds cpu hotplug functions to nvdimm pmu.
>>>
>>> I'm thinking "Patch here" qualifies for "This patch", see
>>> Documentation/process/submitting-patches.rst .
>>>
>> Hi Peter,
>>     I will reword this commit message.
>>
>>>> It adds cpumask to designate a cpu to make HCALL to
>>>> collect the counter data for the nvdimm device and
>>>> update ABI documentation accordingly.
>>>>
>>>> Result in power9 lpar system:
>>>> command:# cat /sys/devices/nmem0/cpumask
>>>> 0
>>>
>>> Is this specific to the papr thing, or should this be in generic nvdimm
>>> code?
>>
>> This code is not specific to papr device and we can move it to
>> generic nvdimm interface. But do we need to add some checks on whether
>> any arch/platform specific driver want that support or we can assume
>> that this will be something needed by all platforms?
>>
> 
> It says the cpu that should be used to make the hcall. That makes it PAPR specific.

Hi Aneesh,
  The hcall in the commit message basically pointing to the method we used to get
counter data. But adding cpumask to a PMU is not specific to powerpc.
So, Incase other platform/arch want to enable hotplug feature, they can use same code for
that and hence we can move it to generic nvdimm interface.

Our concerned it mainly about is it right to assume from the common code point of view, if
the cpumask attr is null, common code can add the cpumask support to it, or 
do we need to have explicit flag for the device to request for it.

Thanks,
Kajol Jain
> 
> -aneesh
>
Kajol Jain May 28, 2021, 7:53 a.m. UTC | #6
On 5/26/21 2:02 PM, Peter Zijlstra wrote:
> On Wed, May 26, 2021 at 12:56:58PM +0530, kajoljain wrote:
>> On 5/25/21 7:46 PM, Peter Zijlstra wrote:
>>> On Tue, May 25, 2021 at 06:52:16PM +0530, Kajol Jain wrote:
> 
>>>> It adds cpumask to designate a cpu to make HCALL to
>>>> collect the counter data for the nvdimm device and
>>>> update ABI documentation accordingly.
>>>>
>>>> Result in power9 lpar system:
>>>> command:# cat /sys/devices/nmem0/cpumask
>>>> 0
>>>
>>> Is this specific to the papr thing, or should this be in generic nvdimm
>>> code?
>>
>> This code is not specific to papr device and we can move it to
>> generic nvdimm interface. But do we need to add some checks on whether
>> any arch/platform specific driver want that support or we can assume 
>> that this will be something needed by all platforms?
> 
> I'm a complete NVDIMM n00b, but to me it would appear they would have to
> conform to the normal memory hierarchy and would thus always be
> per-node.
> 
> Also, if/when deviation from this rule is observed, we can always
> rework/extend this. For now I think it would make sense to have the
> per-node ness of the thing expressed in the generic layer.
> 

Hi Peter,
  Thanks for the suggestion, I will send new RFC patchset with these changes.

Thanks,
Kajol Jain
diff mbox series

Patch

diff --git a/Documentation/ABI/testing/sysfs-bus-papr-pmem b/Documentation/ABI/testing/sysfs-bus-papr-pmem
index 38c4daf65af2..986df1691914 100644
--- a/Documentation/ABI/testing/sysfs-bus-papr-pmem
+++ b/Documentation/ABI/testing/sysfs-bus-papr-pmem
@@ -76,6 +76,12 @@  Description:	(RO) Attribute group to describe the magic bits
 		For example::
 		    noopstat = "event=0x1"
 
+What:		/sys/devices/nmemX/cpumask
+Date:		May 2021
+Contact:	linuxppc-dev <linuxppc-dev@lists.ozlabs.org>, linux-nvdimm@lists.01.org,
+Description:	(RO) This sysfs file exposes the cpumask which is designated to make
+                HCALLs to retrieve nvdimm pmu event counter data.
+
 What:		/sys/devices/nmemX/events
 Date:		May 2021
 Contact:	linuxppc-dev <linuxppc-dev@lists.ozlabs.org>, linux-nvdimm@lists.01.org,
diff --git a/arch/powerpc/platforms/pseries/papr_scm.c b/arch/powerpc/platforms/pseries/papr_scm.c
index f2d57da98ff4..76121d876b7f 100644
--- a/arch/powerpc/platforms/pseries/papr_scm.c
+++ b/arch/powerpc/platforms/pseries/papr_scm.c
@@ -339,6 +339,28 @@  static ssize_t drc_pmem_query_stats(struct papr_scm_priv *p,
 	return 0;
 }
 
+static ssize_t cpumask_show(struct device *dev,
+			    struct device_attribute *attr, char *buf)
+{
+	struct pmu *pmu = dev_get_drvdata(dev);
+	struct nvdimm_pmu *nd_pmu;
+
+	nd_pmu = container_of(pmu, struct nvdimm_pmu, pmu);
+
+	return cpumap_print_to_pagebuf(true, buf, cpumask_of(nd_pmu->cpu));
+}
+
+static DEVICE_ATTR_RO(cpumask);
+
+static struct attribute *nvdimm_cpumask_attrs[] = {
+	&dev_attr_cpumask.attr,
+	NULL,
+};
+
+static const struct attribute_group nvdimm_pmu_cpumask_group = {
+	.attrs = nvdimm_cpumask_attrs,
+};
+
 PMU_FORMAT_ATTR(event, "config:0-4");
 
 static struct attribute *nvdimm_pmu_format_attr[] = {
@@ -459,6 +481,24 @@  static void papr_scm_pmu_del(struct perf_event *event, int flags)
 	papr_scm_pmu_read(event);
 }
 
+static int nvdimm_pmu_offline_cpu(unsigned int cpu, struct hlist_node *node)
+{
+	struct nvdimm_pmu *nd_pmu;
+	int target;
+
+	nd_pmu = hlist_entry_safe(node, struct nvdimm_pmu, node);
+
+	if (cpu != nd_pmu->cpu)
+		return 0;
+
+	target = cpumask_last(cpu_active_mask);
+	if (target < 0 || target >= nr_cpu_ids)
+		return -1;
+
+	nd_pmu->cpu = target;
+	return 0;
+}
+
 static ssize_t device_show_string(struct device *dev, struct device_attribute *attr,
 				  char *buf)
 {
@@ -603,6 +643,7 @@  static int papr_scm_pmu_check_events(struct papr_scm_priv *p, struct nvdimm_pmu
 	/* Fill attribute groups for the nvdimm pmu device */
 	nd_pmu->attr_groups[NVDIMM_PMU_FORMAT_ATTR] = &nvdimm_pmu_format_group;
 	nd_pmu->attr_groups[NVDIMM_PMU_EVENT_ATTR] = nvdimm_pmu_events_group;
+	nd_pmu->attr_groups[NVDIMM_PMU_CPUMASK_ATTR] = &nvdimm_pmu_cpumask_group;
 	nd_pmu->attr_groups[NVDIMM_PMU_NULL_ATTR] = NULL;
 
 	kfree(single_stats);
@@ -652,6 +693,20 @@  static void papr_scm_pmu_register(struct papr_scm_priv *p)
 	nd_pmu->read = papr_scm_pmu_read;
 	nd_pmu->add = papr_scm_pmu_add;
 	nd_pmu->del = papr_scm_pmu_del;
+	nd_pmu->cpu = raw_smp_processor_id();
+
+	rc = cpuhp_setup_state_multi(CPUHP_AP_ONLINE_DYN,
+				     "perf/nvdimm:online",
+			      NULL, nvdimm_pmu_offline_cpu);
+	if (rc < 0)
+		goto pmu_cpuhp_setup_err;
+
+	nd_pmu->cpuhp_state = rc;
+
+	/* Register the pmu instance for cpu hotplug */
+	rc = cpuhp_state_add_instance_nocalls(nd_pmu->cpuhp_state, &nd_pmu->node);
+	if (rc)
+		goto cpuhp_instance_err;
 
 	rc = register_nvdimm_pmu(nd_pmu, p->pdev);
 	if (rc)
@@ -665,6 +720,10 @@  static void papr_scm_pmu_register(struct papr_scm_priv *p)
 	return;
 
 pmu_register_err:
+	cpuhp_state_remove_instance_nocalls(nd_pmu->cpuhp_state, &nd_pmu->node);
+cpuhp_instance_err:
+	cpuhp_remove_multi_state(nd_pmu->cpuhp_state);
+pmu_cpuhp_setup_err:
 	nvdimm_pmu_mem_free(nd_pmu);
 	kfree(p->nvdimm_events_map);
 pmu_check_events_err:
@@ -675,6 +734,8 @@  static void papr_scm_pmu_register(struct papr_scm_priv *p)
 
 static void nvdimm_pmu_uinit(struct nvdimm_pmu *nd_pmu)
 {
+	cpuhp_state_remove_instance_nocalls(nd_pmu->cpuhp_state, &nd_pmu->node);
+	cpuhp_remove_multi_state(nd_pmu->cpuhp_state);
 	unregister_nvdimm_pmu(&nd_pmu->pmu);
 	nvdimm_pmu_mem_free(nd_pmu);
 	kfree(nd_pmu);
diff --git a/include/linux/nd.h b/include/linux/nd.h
index a0e0619256be..177795413ab3 100644
--- a/include/linux/nd.h
+++ b/include/linux/nd.h
@@ -28,7 +28,8 @@  enum nvdimm_claim_class {
 /* Event attribute array index */
 #define NVDIMM_PMU_FORMAT_ATTR		0
 #define NVDIMM_PMU_EVENT_ATTR		1
-#define NVDIMM_PMU_NULL_ATTR		2
+#define NVDIMM_PMU_CPUMASK_ATTR		2
+#define NVDIMM_PMU_NULL_ATTR		3
 
 /**
  * struct nvdimm_pmu - data structure for nvdimm perf driver
@@ -37,7 +38,10 @@  enum nvdimm_claim_class {
  * @pmu: pmu data structure for nvdimm performance stats.
  * @dev: nvdimm device pointer.
  * @functions(event_init/add/del/read): platform specific pmu functions.
- * @attr_groups: data structure for events and formats.
+ * @attr_groups: data structure for events, formats and cpumask
+ * @cpu: designated cpu for counter access.
+ * @node: node for cpu hotplug notifier link.
+ * @cpuhp_state: state for cpu hotplug notification.
  */
 struct nvdimm_pmu {
 	const char *name;
@@ -49,10 +53,13 @@  struct nvdimm_pmu {
 	void (*read)(struct perf_event *event);
 	/*
 	 * Attribute groups for the nvdimm pmu. Index 0 used for
-	 * format attribute, index 1 used for event attribute and
-	 * index 2 kept as NULL.
+	 * format attribute, index 1 used for event attribute,
+	 * index 2 used for cpusmask attribute and index 3 kept as NULL.
 	 */
-	const struct attribute_group *attr_groups[3];
+	const struct attribute_group *attr_groups[4];
+	int cpu;
+	struct hlist_node node;
+	enum cpuhp_state cpuhp_state;
 };
 
 int register_nvdimm_pmu(struct nvdimm_pmu *nvdimm, struct platform_device *pdev);