diff mbox

[Vivid,SRU] ACPI / processor: Request native thermal interrupt handling via _OSC

Message ID 1459868877-17763-1-git-send-email-kamal@canonical.com
State New
Headers show

Commit Message

Kamal Mostafa April 5, 2016, 3:07 p.m. UTC
From: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>

BugLink: http://bugs.launchpad.net/bugs/1559923

There are several reports of freeze on enabling HWP (Hardware PStates)
feature on Skylake-based systems by the Intel P-states driver. The root
cause is identified as the HWP interrupts causing BIOS code to freeze.

HWP interrupts use the thermal LVT which can be handled by Linux
natively, but on the affected Skylake-based systems SMM will respond
to it by default.  This is a problem for several reasons:
 - On the affected systems the SMM thermal LVT handler is broken (it
   will crash when invoked) and a BIOS update is necessary to fix it.
 - With thermal interrupt handled in SMM we lose all of the reporting
   features of the arch/x86/kernel/cpu/mcheck/therm_throt driver.
 - Some thermal drivers like x86-package-temp depend on the thermal
   threshold interrupts signaled via the thermal LVT.
 - The HWP interrupts are useful for debugging and tuning
   performance (if the kernel can handle them).
The native handling of thermal interrupts needs to be enabled
because of that.

This requires some way to tell SMM that the OS can handle thermal
interrupts.  That can be done by using _OSC/_PDC in processor
scope very early during ACPI initialization.

The meaning of _OSC/_PDC bit 12 in processor scope is whether or
not the OS supports native handling of interrupts for Collaborative
Processor Performance Control (CPPC) notifications.  Since on
HWP-capable systems CPPC is a firmware interface to HWP, setting
this bit effectively tells the firmware that the OS will handle
thermal interrupts natively going forward.

For details on _OSC/_PDC refer to:
http://www.intel.com/content/www/us/en/standards/processor-vendor-specific-acpi-specification.html

To implement the _OSC/_PDC handshake as described, introduce a new
function, acpi_early_processor_osc(), that walks the ACPI
namespace looking for ACPI processor objects and invokes _OSC for
them with bit 12 in the capabilities buffer set and terminates the
namespace walk on the first success.

Also modify intel_thermal_interrupt() to clear HWP status bits in
the HWP_STATUS MSR to acknowledge HWP interrupts (which prevents
them from firing continuously).

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
[ rjw: Subject & changelog, function rename ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
(back-ported from commit a21211672c9a1d730a39aa65d4a5b3414700adfb)
[ kamal: backport to 3.19: wrmsrl_safe needs ULL arg for gcc warning ]
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
---
 arch/x86/kernel/cpu/mcheck/therm_throt.c |  3 ++
 drivers/acpi/acpi_processor.c            | 52 ++++++++++++++++++++++++++++++++
 drivers/acpi/bus.c                       |  3 ++
 drivers/acpi/internal.h                  |  6 ++++
 4 files changed, 64 insertions(+)

Comments

Stefan Bader April 5, 2016, 3:29 p.m. UTC | #1
Wasn't that said to be only needed back to 15.10/Wily?
Kamal Mostafa April 5, 2016, 3:45 p.m. UTC | #2
On Tue, Apr 05, 2016 at 05:29:05PM +0200, Stefan Bader wrote:
> Wasn't that said to be only needed back to 15.10/Wily?

Comment #5 in the bug report says "Since 15.10 has SKL HW-P support
also, this patch should be backported into 15.10 kernel also."

I believe "SKL HW-P support" == X86_FEATURE_HWP, which is in Vivid too
(Vivid is the first Ubuntu kernel to carry it).  So I think Vivid
probably needs this fix too.

 -Kamal
Stefan Bader April 5, 2016, 3:49 p.m. UTC | #3
On 05.04.2016 17:45, Kamal Mostafa wrote:
> On Tue, Apr 05, 2016 at 05:29:05PM +0200, Stefan Bader wrote:
>> Wasn't that said to be only needed back to 15.10/Wily?
> 
> Comment #5 in the bug report says "Since 15.10 has SKL HW-P support
> also, this patch should be backported into 15.10 kernel also."
> 
> I believe "SKL HW-P support" == X86_FEATURE_HWP, which is in Vivid too
> (Vivid is the first Ubuntu kernel to carry it).  So I think Vivid
> probably needs this fix too.
> 
>  -Kamal
> 
Ok, if you say so. The patch looked to be as the wily one (plus the documented
change).

-Stefan
Tim Gardner April 5, 2016, 4:03 p.m. UTC | #4

Kamal Mostafa April 5, 2016, 4:11 p.m. UTC | #5

diff mbox

Patch

diff --git a/arch/x86/kernel/cpu/mcheck/therm_throt.c b/arch/x86/kernel/cpu/mcheck/therm_throt.c
index 1af51b1..0db45aa 100644
--- a/arch/x86/kernel/cpu/mcheck/therm_throt.c
+++ b/arch/x86/kernel/cpu/mcheck/therm_throt.c
@@ -385,6 +385,9 @@  static void intel_thermal_interrupt(void)
 {
 	__u64 msr_val;
 
+	if (static_cpu_has(X86_FEATURE_HWP))
+		wrmsrl_safe(MSR_HWP_STATUS, 0ULL);
+
 	rdmsrl(MSR_IA32_THERM_STATUS, msr_val);
 
 	/* Check for violation of core thermal thresholds*/
diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
index 1020b1b..ee17817 100644
--- a/drivers/acpi/acpi_processor.c
+++ b/drivers/acpi/acpi_processor.c
@@ -474,6 +474,58 @@  static void acpi_processor_remove(struct acpi_device *device)
 }
 #endif /* CONFIG_ACPI_HOTPLUG_CPU */
 
+#ifdef CONFIG_X86
+static bool acpi_hwp_native_thermal_lvt_set;
+static acpi_status __init acpi_hwp_native_thermal_lvt_osc(acpi_handle handle,
+							  u32 lvl,
+							  void *context,
+							  void **rv)
+{
+	u8 sb_uuid_str[] = "4077A616-290C-47BE-9EBD-D87058713953";
+	u32 capbuf[2];
+	struct acpi_osc_context osc_context = {
+		.uuid_str = sb_uuid_str,
+		.rev = 1,
+		.cap.length = 8,
+		.cap.pointer = capbuf,
+	};
+
+	if (acpi_hwp_native_thermal_lvt_set)
+		return AE_CTRL_TERMINATE;
+
+	capbuf[0] = 0x0000;
+	capbuf[1] = 0x1000; /* set bit 12 */
+
+	if (ACPI_SUCCESS(acpi_run_osc(handle, &osc_context))) {
+		if (osc_context.ret.pointer && osc_context.ret.length > 1) {
+			u32 *capbuf_ret = osc_context.ret.pointer;
+
+			if (capbuf_ret[1] & 0x1000) {
+				acpi_handle_info(handle,
+					"_OSC native thermal LVT Acked\n");
+				acpi_hwp_native_thermal_lvt_set = true;
+			}
+		}
+		kfree(osc_context.ret.pointer);
+	}
+
+	return AE_OK;
+}
+
+void __init acpi_early_processor_osc(void)
+{
+	if (boot_cpu_has(X86_FEATURE_HWP)) {
+		acpi_walk_namespace(ACPI_TYPE_PROCESSOR, ACPI_ROOT_OBJECT,
+				    ACPI_UINT32_MAX,
+				    acpi_hwp_native_thermal_lvt_osc,
+				    NULL, NULL, NULL);
+		acpi_get_devices(ACPI_PROCESSOR_DEVICE_HID,
+				 acpi_hwp_native_thermal_lvt_osc,
+				 NULL, NULL);
+	}
+}
+#endif
+
 /*
  * The following ACPI IDs are known to be suitable for representing as
  * processor devices.
diff --git a/drivers/acpi/bus.c b/drivers/acpi/bus.c
index cd4598b..ec85614 100644
--- a/drivers/acpi/bus.c
+++ b/drivers/acpi/bus.c
@@ -609,6 +609,9 @@  static int __init acpi_bus_init(void)
 		goto error1;
 	}
 
+	/* Set capability bits for _OSC under processor scope */
+	acpi_early_processor_osc();
+
 	/*
 	 * _OSC method may exist in module level code,
 	 * so it must be run after ACPI_FULL_INITIALIZATION
diff --git a/drivers/acpi/internal.h b/drivers/acpi/internal.h
index ccbf353..08fd420 100644
--- a/drivers/acpi/internal.h
+++ b/drivers/acpi/internal.h
@@ -114,6 +114,12 @@  void acpi_early_processor_set_pdc(void);
 static inline void acpi_early_processor_set_pdc(void) {}
 #endif
 
+#ifdef CONFIG_X86
+void acpi_early_processor_osc(void);
+#else
+static inline void acpi_early_processor_osc(void) {}
+#endif
+
 /* --------------------------------------------------------------------------
                                   Embedded Controller
    -------------------------------------------------------------------------- */