diff mbox series

[SRU,mantic:linux,1/1] x86/tsc: Extend watchdog check exemption to 4-Sockets platform

Message ID 20240227141037.1744821-2-philip.cox@canonical.com
State New
Headers show
Series x86: performance: tsc: Extend watchdog check exemption to 4-Sockets platform | expand

Commit Message

Philip Cox Feb. 27, 2024, 2:10 p.m. UTC
From: Feng Tang <feng.tang@intel.com>

BugLink: https://bugs.launchpad.net/bugs/2054699

There were reports again that the tsc clocksource on 4 sockets x86
servers was wrongly judged as 'unstable' by 'jiffies' and other
watchdogs, and disabled [1][2].

Commit b50db7095fe0 ("x86/tsc: Disable clocksource watchdog for TSC
on qualified platorms") was introduce to deal with these false
alarms of tsc unstable issues, covering qualified platforms for 2
sockets or smaller ones. And from history of chasing TSC issues,
Thomas and Peter only saw real TSC synchronization issue on 8 socket
machines.

So extend the exemption to 4 sockets to fix the issue.

Rui also proposed another way to disable 'jiffies' as clocksource
watchdog [3], which can also solve problem in [1]. in an architecture
independent way, but can't cure the problem in [2]. whose watchdog
is HPET or PMTIMER, while 'jiffies' is mostly used as watchdog in
boot phase.

'nr_online_nodes' has known inaccurate problem for cases like
platform with cpu-less memory nodes, sub numa cluster enabled,
fakenuma, kernel cmdline parameter 'maxcpus=', etc. The harmful case
is the 'maxcpus' one which could possibly under estimates the package
number, and disable the watchdog, but bright side is it is mostly
for debug usage. All these will be addressed in other patches, as
discussed in thread [4].

[1]. https://lore.kernel.org/all/9d3bf570-3108-0336-9c52-9bee15767d29@huawei.com/
[2]. https://lore.kernel.org/lkml/06df410c-2177-4671-832f-339cff05b1d9@paulmck-laptop/
[3]. https://lore.kernel.org/all/bd5b97f89ab2887543fc262348d1c7cafcaae536.camel@intel.com/
[4]. https://lore.kernel.org/all/20221021062131.1826810-1-feng.tang@intel.com/

Reported-by: Yu Liao <liaoyu15@huawei.com>
Reported-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Feng Tang <feng.tang@intel.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
(cherry picked from commit 233756a640be811efae33763db718fe29753b1e9)
Signed-off-by: Philip Cox <philip.cox@canonical.com>
---
 arch/x86/kernel/tsc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
diff mbox series

Patch

diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index 3425c6a943e4..15f97c0abc9d 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -1258,7 +1258,7 @@  static void __init check_system_tsc_reliable(void)
 	if (boot_cpu_has(X86_FEATURE_CONSTANT_TSC) &&
 	    boot_cpu_has(X86_FEATURE_NONSTOP_TSC) &&
 	    boot_cpu_has(X86_FEATURE_TSC_ADJUST) &&
-	    nr_online_nodes <= 2)
+	    nr_online_nodes <= 4)
 		tsc_disable_clocksource_watchdog();
 }