diff mbox series

[v8,15/15] x86/split_lock: Add a sysfs interface to enable/disable split lock detection during run time

Message ID 1556134382-58814-16-git-send-email-fenghua.yu@intel.com
State Not Applicable
Delegated to: David Miller
Headers show
Series x86/split_lock: Enable split lock detection | expand

Commit Message

Fenghua Yu April 24, 2019, 7:33 p.m. UTC
To workaround or debug a split lock issue, the administrator may need to
disable or enable split lock detection during run time without rebooting
the system.

The interface /sys/device/system/cpu/split_lock_detect is added to allow
the administrator to disable or enable split lock detection and show
current split lock detection setting.

Writing [yY1] or [oO][nN] to the file enables split lock detection and
writing [nN0] or [oO][fF] disables split lock detection. Split lock
detection is enabled or disabled on all CPUs.

Reading the file returns current global split lock detection setting:
0: disabled
1: enabled

Add an ABI document entry for /sys/devices/system/cpu/split_lock_detect.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
---
Not sure if the justification for the sysfs knob is valid. If not, this
patch could be removed from this patch set.

 .../ABI/testing/sysfs-devices-system-cpu      | 22 ++++++++
 arch/x86/kernel/cpu/intel.c                   | 52 ++++++++++++++++++-
 2 files changed, 72 insertions(+), 2 deletions(-)

Comments

Ingo Molnar April 25, 2019, 6:31 a.m. UTC | #1
* Fenghua Yu <fenghua.yu@intel.com> wrote:

> To workaround or debug a split lock issue, the administrator may need to
> disable or enable split lock detection during run time without rebooting
> the system.
> 
> The interface /sys/device/system/cpu/split_lock_detect is added to allow
> the administrator to disable or enable split lock detection and show
> current split lock detection setting.
> 
> Writing [yY1] or [oO][nN] to the file enables split lock detection and
> writing [nN0] or [oO][fF] disables split lock detection. Split lock
> detection is enabled or disabled on all CPUs.
> 
> Reading the file returns current global split lock detection setting:
> 0: disabled
> 1: enabled
> 
> Add an ABI document entry for /sys/devices/system/cpu/split_lock_detect.
> 
> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
> ---
> Not sure if the justification for the sysfs knob is valid. If not, this
> patch could be removed from this patch set.
> 
>  .../ABI/testing/sysfs-devices-system-cpu      | 22 ++++++++
>  arch/x86/kernel/cpu/intel.c                   | 52 ++++++++++++++++++-
>  2 files changed, 72 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/ABI/testing/sysfs-devices-system-cpu b/Documentation/ABI/testing/sysfs-devices-system-cpu
> index 9605dbd4b5b5..aad7b1698065 100644
> --- a/Documentation/ABI/testing/sysfs-devices-system-cpu
> +++ b/Documentation/ABI/testing/sysfs-devices-system-cpu
> @@ -67,6 +67,28 @@ Description:	Discover NUMA node a CPU belongs to
>  		/sys/devices/system/cpu/cpu42/node2 -> ../../node/node2
>  
>  
> +What:		/sys/devices/system/cpu/split_lock_detect
> +Date:		March 2019
> +Contact:	Linux kernel mailing list <linux-kernel@vger.kernel.org>
> +Description:	(RW) Control split lock detection on Intel Tremont and
> +		future CPUs
> +
> +		Reads return split lock detection status:
> +			0: disabled
> +			1: enabled
> +
> +		Writes enable or disable split lock detection:
> +			The first character is one of 'Nn0' or [oO][fF] for off
> +			disables the feature.
> +			The first character is one of 'Yy1' or [oO][nN] for on
> +			enables the feature.
> +
> +		Please note the interface only shows or controls global setting.
> +		During run time, split lock detection on one CPU may be
> +		disabled if split lock operation in kernel code happens on
> +		the CPU. The interface doesn't show or control split lock
> +		detection on individual CPU.

I.e. implementation and possible actual state are out of sync. Why?

Also, if it's a global flag, why waste memory on putting a sysfs knob 
into every CPU's sysfs file?

Finally, why is a debugging facility in sysfs, why not a debugfs knob? 
Using a sysctl would solve the percpu vs. global confusion as well ...

> --- a/arch/x86/kernel/cpu/intel.c
> +++ b/arch/x86/kernel/cpu/intel.c
> @@ -35,6 +35,7 @@
>  DEFINE_PER_CPU(u64, msr_test_ctl_cache);
>  EXPORT_PER_CPU_SYMBOL_GPL(msr_test_ctl_cache);
>  
> +static DEFINE_MUTEX(split_lock_detect_mutex);
>  static bool split_lock_detect_enable;

'enable' is a verb in plain form - which we use for function names.

For variable names that denotes current state we typically use past 
tense, i.e. 'enabled'.

(The only case where we'd us the split_lock_detect_enable name for a flag 
if it's a flag to trigger some sort of enabling action - which this 
isn't.)

Please review the whole series for various naming mishaps.

> +	mutex_lock(&split_lock_detect_mutex);
> +
> +	split_lock_detect_enable = val;
> +
> +	/* Update the split lock detection setting in MSR on all online CPUs. */
> +	on_each_cpu(split_lock_update_msr, NULL, 1);
> +
> +	if (split_lock_detect_enable)
> +		pr_info("enabled\n");
> +	else
> +		pr_info("disabled\n");
> +
> +	mutex_unlock(&split_lock_detect_mutex);

Instead of a mutex, please just use the global atomic debug flag which 
controls the warning printout. By using that flag both for the WARN()ing 
and for controlling MSR state all the races are solved and the code is 
simplified.


Thanks,

	Ingo
Fenghua Yu May 6, 2019, 12:21 a.m. UTC | #2
On Thu, Apr 25, 2019 at 08:31:15AM +0200, Ingo Molnar wrote:
> 
> * Fenghua Yu <fenghua.yu@intel.com> wrote:
> 
> > +		disabled if split lock operation in kernel code happens on
> > +		the CPU. The interface doesn't show or control split lock
> > +		detection on individual CPU.
> 
> I.e. implementation and possible actual state are out of sync. Why?
> 
> Also, if it's a global flag, why waste memory on putting a sysfs knob 
> into every CPU's sysfs file?
> 
> Finally, why is a debugging facility in sysfs, why not a debugfs knob? 
> Using a sysctl would solve the percpu vs. global confusion as well ...

Can I put the interface in /sys/kernel/debug/x86/split_lock_detect?

> 
> > --- a/arch/x86/kernel/cpu/intel.c
> > +++ b/arch/x86/kernel/cpu/intel.c
> > @@ -35,6 +35,7 @@
> >  DEFINE_PER_CPU(u64, msr_test_ctl_cache);
> >  EXPORT_PER_CPU_SYMBOL_GPL(msr_test_ctl_cache);
> >  
> > +static DEFINE_MUTEX(split_lock_detect_mutex);
> >  static bool split_lock_detect_enable;
> 
> 'enable' is a verb in plain form - which we use for function names.
> 
> For variable names that denotes current state we typically use past 
> tense, i.e. 'enabled'.
> 
> (The only case where we'd us the split_lock_detect_enable name for a flag 
> if it's a flag to trigger some sort of enabling action - which this 
> isn't.)
> 
> Please review the whole series for various naming mishaps.
OK.

> 
> > +	mutex_lock(&split_lock_detect_mutex);
> > +
> > +	split_lock_detect_enable = val;
> > +
> > +	/* Update the split lock detection setting in MSR on all online CPUs. */
> > +	on_each_cpu(split_lock_update_msr, NULL, 1);
> > +
> > +	if (split_lock_detect_enable)
> > +		pr_info("enabled\n");
> > +	else
> > +		pr_info("disabled\n");
> > +
> > +	mutex_unlock(&split_lock_detect_mutex);
> 
> Instead of a mutex, please just use the global atomic debug flag which 
> controls the warning printout. By using that flag both for the WARN()ing 
> and for controlling MSR state all the races are solved and the code is 
> simplified.

So is it OK to define split_lock_debug and use it in #AC handler and in
here?

static atomic_t split_lock_debug;

in #AC handler:

+       if (atomic_cmpxchg(&split_lock_debug, 0, 1) == 0) {
+               /* Only warn split lock once */
+               WARN_ONCE(1, "split lock operation detected\n");
+               atomic_set(&split_lock_debug, 0);
+       }

And in split_lock_detect_store(), replace the mutex with split_lock_debug
like this:
 
-       mutex_lock(&split_lock_detect_mutex);
+       while (atomic_cmpxchg(&split_lock_debug, 1, 0))
+              cpu_relax();
.... 
-       mutex_unlock(&split_lock_detect_mutex);
+       atomic_set(&split_lock_debug, 0);
 
Is this right code for sync in both #AC handler and in
split_lock_detect_store()?

Thanks.

-Fenghua
diff mbox series

Patch

diff --git a/Documentation/ABI/testing/sysfs-devices-system-cpu b/Documentation/ABI/testing/sysfs-devices-system-cpu
index 9605dbd4b5b5..aad7b1698065 100644
--- a/Documentation/ABI/testing/sysfs-devices-system-cpu
+++ b/Documentation/ABI/testing/sysfs-devices-system-cpu
@@ -67,6 +67,28 @@  Description:	Discover NUMA node a CPU belongs to
 		/sys/devices/system/cpu/cpu42/node2 -> ../../node/node2
 
 
+What:		/sys/devices/system/cpu/split_lock_detect
+Date:		March 2019
+Contact:	Linux kernel mailing list <linux-kernel@vger.kernel.org>
+Description:	(RW) Control split lock detection on Intel Tremont and
+		future CPUs
+
+		Reads return split lock detection status:
+			0: disabled
+			1: enabled
+
+		Writes enable or disable split lock detection:
+			The first character is one of 'Nn0' or [oO][fF] for off
+			disables the feature.
+			The first character is one of 'Yy1' or [oO][nN] for on
+			enables the feature.
+
+		Please note the interface only shows or controls global setting.
+		During run time, split lock detection on one CPU may be
+		disabled if split lock operation in kernel code happens on
+		the CPU. The interface doesn't show or control split lock
+		detection on individual CPU.
+
 What:		/sys/devices/system/cpu/cpu#/topology/core_id
 		/sys/devices/system/cpu/cpu#/topology/core_siblings
 		/sys/devices/system/cpu/cpu#/topology/core_siblings_list
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index 959ebf25beda..f257d1e92706 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -35,6 +35,7 @@ 
 DEFINE_PER_CPU(u64, msr_test_ctl_cache);
 EXPORT_PER_CPU_SYMBOL_GPL(msr_test_ctl_cache);
 
+static DEFINE_MUTEX(split_lock_detect_mutex);
 static bool split_lock_detect_enable;
 
 /*
@@ -660,7 +661,7 @@  static void init_intel_misc_features(struct cpuinfo_x86 *c)
 	wrmsrl(MSR_MISC_FEATURES_ENABLES, msr);
 }
 
-static void split_lock_update_msr(void)
+static void split_lock_update_msr(void *__unused)
 {
 	if (split_lock_detect_enable) {
 		/* Enable split lock detection */
@@ -682,7 +683,7 @@  static void init_split_lock_detect(struct cpuinfo_x86 *c)
 		rdmsrl(MSR_TEST_CTL, test_ctl_val);
 		this_cpu_write(msr_test_ctl_cache, test_ctl_val);
 
-		split_lock_update_msr();
+		split_lock_update_msr(NULL);
 	}
 }
 
@@ -1114,3 +1115,50 @@  void handle_split_lock_kernel_mode(void)
 	this_cpu_and(msr_test_ctl_cache, ~TEST_CTL_SPLIT_LOCK_DETECT);
 	WARN_ONCE(1, "split lock operation detected\n");
 }
+
+static ssize_t
+split_lock_detect_show(struct device *dev, struct device_attribute *attr,
+		       char *buf)
+{
+	return sprintf(buf, "%u\n", split_lock_detect_enable);
+}
+
+static ssize_t
+split_lock_detect_store(struct device *dev, struct device_attribute *attr,
+			const char *buf, size_t count)
+{
+	bool val;
+	int ret;
+
+	ret = strtobool(buf, &val);
+	if (ret)
+		return ret;
+
+	mutex_lock(&split_lock_detect_mutex);
+
+	split_lock_detect_enable = val;
+
+	/* Update the split lock detection setting in MSR on all online CPUs. */
+	on_each_cpu(split_lock_update_msr, NULL, 1);
+
+	if (split_lock_detect_enable)
+		pr_info("enabled\n");
+	else
+		pr_info("disabled\n");
+
+	mutex_unlock(&split_lock_detect_mutex);
+
+	return count;
+}
+
+static DEVICE_ATTR_RW(split_lock_detect);
+
+static int __init split_lock_init(void)
+{
+	if (!boot_cpu_has(X86_FEATURE_SPLIT_LOCK_DETECT))
+		return -ENODEV;
+
+	return device_create_file(cpu_subsys.dev_root,
+				  &dev_attr_split_lock_detect);
+}
+subsys_initcall(split_lock_init);