Message ID | 20170912194147.862865570@linutronix.de (mailing list archive) |
---|---|
State | Not Applicable |
Headers | show |
Series | None | expand |
Hi Thomas, Thomas Gleixner <tglx@linutronix.de> writes: > Both the perf reconfiguration and the powerpc watchdog_nmi_reconfigure() > need to be done in two steps. > > 1) Stop all NMIs > 2) Read the new parameters and start NMIs > > Right now watchdog_nmi_reconfigure() is a combination of both. To allow a > clean reconfiguration add a 'run' argument and split the functionality in > powerpc. > > Signed-off-by: Thomas Gleixner <tglx@linutronix.de> > Cc: Don Zickus <dzickus@redhat.com> > Cc: Chris Metcalf <cmetcalf@mellanox.com> > Cc: Peter Zijlstra <peterz@infradead.org> > Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> > Cc: Sebastian Siewior <bigeasy@linutronix.de> > Cc: Nicholas Piggin <npiggin@gmail.com> > Cc: Ulrich Obergfell <uobergfe@redhat.com> > Cc: Borislav Petkov <bp@alien8.de> > Cc: Michael Ellerman <mpe@ellerman.id.au> > Cc: Andrew Morton <akpm@linux-foundation.org> > Cc: linuxppc-dev@lists.ozlabs.org > Link: http://lkml.kernel.org/r/20170831073054.740462115@linutronix.de > > --- > arch/powerpc/kernel/watchdog.c | 17 +++++++++-------- > include/linux/nmi.h | 2 ++ > kernel/watchdog.c | 31 ++++++++++++++++++++++--------- > 3 files changed, 33 insertions(+), 17 deletions(-) Unfortunately this is hitting the WARN_ON in start_wd_cpu() on powerpc because we're calling it multiple times for the boot CPU. The first call is via: start_wd_on_cpu+0x80/0x2f0 watchdog_nmi_reconfigure+0x124/0x170 softlockup_reconfigure_threads+0x110/0x130 lockup_detector_init+0xbc/0xe0 kernel_init_freeable+0x18c/0x37c kernel_init+0x2c/0x160 ret_from_kernel_thread+0x5c/0xbc And then again via the CPU hotplug registration: start_wd_on_cpu+0x80/0x2f0 cpuhp_invoke_callback+0x194/0x620 cpuhp_thread_fun+0x7c/0x1b0 smpboot_thread_fn+0x290/0x2a0 kthread+0x168/0x1b0 ret_from_kernel_thread+0x5c/0xbc The first call is new because previously watchdog_nmi_reconfigure() wasn't called from softlockup_reconfigure_threads(). I'm not sure what the easiest fix is. One option would be to just drop the WARN_ON, it's just there for paranoia AFAICS. cheers > > --- a/arch/powerpc/kernel/watchdog.c > +++ b/arch/powerpc/kernel/watchdog.c > @@ -355,17 +355,18 @@ static void watchdog_calc_timeouts(void) > wd_timer_period_ms = watchdog_thresh * 1000 * 2 / 5; > } > > -void watchdog_nmi_reconfigure(void) > +void watchdog_nmi_reconfigure(bool run) > { > int cpu; > > - watchdog_calc_timeouts(); > - > - for_each_cpu(cpu, &wd_cpus_enabled) > - stop_wd_on_cpu(cpu); > - > - for_each_cpu_and(cpu, cpu_online_mask, &watchdog_cpumask) > - start_wd_on_cpu(cpu); > + if (!run) { > + for_each_cpu(cpu, &wd_cpus_enabled) > + stop_wd_on_cpu(cpu); > + } else { > + watchdog_calc_timeouts(); > + for_each_cpu_and(cpu, cpu_online_mask, &watchdog_cpumask) > + start_wd_on_cpu(cpu); > + } > } > > /* > --- a/include/linux/nmi.h > +++ b/include/linux/nmi.h > @@ -103,6 +103,8 @@ static inline void arch_touch_nmi_watchd > #endif > #endif > > +void watchdog_nmi_reconfigure(bool run); > + > /** > * touch_nmi_watchdog - restart NMI watchdog timeout. > * > --- a/kernel/watchdog.c > +++ b/kernel/watchdog.c > @@ -112,17 +112,25 @@ void __weak watchdog_nmi_disable(unsigne > hardlockup_detector_perf_disable(); > } > > -/* > - * watchdog_nmi_reconfigure can be implemented to be notified after any > - * watchdog configuration change. The arch hardlockup watchdog should > - * respond to the following variables: > +/** > + * watchdog_nmi_reconfigure - Optional function to reconfigure NMI watchdogs > + * @run: If false stop the watchdogs on all enabled CPUs > + * If true start the watchdogs on all enabled CPUs > + * > + * The core call order is: > + * watchdog_nmi_reconfigure(false); > + * update_variables(); > + * watchdog_nmi_reconfigure(true); > + * > + * The second call which starts the watchdogs again guarantees that the > + * following variables are stable across the call. > * - watchdog_enabled > * - watchdog_thresh > * - watchdog_cpumask > - * - sysctl_hardlockup_all_cpu_backtrace > - * - hardlockup_panic > + * > + * After the call the variables can be changed again. > */ > -void __weak watchdog_nmi_reconfigure(void) { } > +void __weak watchdog_nmi_reconfigure(bool run) { } > > #ifdef CONFIG_SOFTLOCKUP_DETECTOR > > @@ -515,10 +523,12 @@ static void softlockup_unpark_threads(vo > > static void softlockup_reconfigure_threads(bool enabled) > { > + watchdog_nmi_reconfigure(false); > softlockup_park_all_threads(); > set_sample_period(); > if (enabled) > softlockup_unpark_threads(); > + watchdog_nmi_reconfigure(true); > } > > /* > @@ -559,7 +569,11 @@ static inline void watchdog_unpark_threa > static inline int watchdog_enable_all_cpus(void) { return 0; } > static inline void watchdog_disable_all_cpus(void) { } > static inline void softlockup_init_threads(void) { } > -static inline void softlockup_reconfigure_threads(bool enabled) { } > +static void softlockup_reconfigure_threads(bool enabled) > +{ > + watchdog_nmi_reconfigure(false); > + watchdog_nmi_reconfigure(true); > +} > #endif /* !CONFIG_SOFTLOCKUP_DETECTOR */ > > static void __lockup_detector_cleanup(void) > @@ -599,7 +613,6 @@ static void proc_watchdog_update(void) > /* Remove impossible cpus to keep sysctl output clean. */ > cpumask_and(&watchdog_cpumask, &watchdog_cpumask, cpu_possible_mask); > softlockup_reconfigure_threads(watchdog_enabled && watchdog_thresh); > - watchdog_nmi_reconfigure(); > } > > /*
On Tue, 3 Oct 2017, Michael Ellerman wrote: > Hi Thomas, > Unfortunately this is hitting the WARN_ON in start_wd_cpu() on powerpc > because we're calling it multiple times for the boot CPU. > > The first call is via: > > start_wd_on_cpu+0x80/0x2f0 > watchdog_nmi_reconfigure+0x124/0x170 > softlockup_reconfigure_threads+0x110/0x130 > lockup_detector_init+0xbc/0xe0 > kernel_init_freeable+0x18c/0x37c > kernel_init+0x2c/0x160 > ret_from_kernel_thread+0x5c/0xbc > > And then again via the CPU hotplug registration: > > start_wd_on_cpu+0x80/0x2f0 > cpuhp_invoke_callback+0x194/0x620 > cpuhp_thread_fun+0x7c/0x1b0 > smpboot_thread_fn+0x290/0x2a0 > kthread+0x168/0x1b0 > ret_from_kernel_thread+0x5c/0xbc > > > The first call is new because previously watchdog_nmi_reconfigure() > wasn't called from softlockup_reconfigure_threads(). Hmm, don't you have the same problem with CPU hotplug or do you just get lucky because the hotplug callback in your code is ordered vs. the softlockup thread hotplug callback in a way that this does not hit? > I'm not sure what the easiest fix is. One option would be to just drop > the WARN_ON, it's just there for paranoia AFAICS. The straight forward way is to make use of the new probe function. Patch below. Thanks, tglx 8<------------------ --- a/arch/powerpc/kernel/watchdog.c +++ b/arch/powerpc/kernel/watchdog.c @@ -375,20 +375,18 @@ void watchdog_nmi_start(void) /* * This runs after lockup_detector_init() which sets up watchdog_cpumask. */ -static int __init powerpc_watchdog_init(void) +int __init watchdog_nmi_probe(void) { int err; - watchdog_calc_timeouts(); - - err = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "powerpc/watchdog:online", - start_wd_on_cpu, stop_wd_on_cpu); + err = cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN, + "powerpc/watchdog:online", + start_wd_on_cpu, stop_wd_on_cpu); if (err < 0) pr_warn("Watchdog could not be initialized"); return 0; } -arch_initcall(powerpc_watchdog_init); static void handle_backtrace_ipi(struct pt_regs *regs) { --- a/kernel/watchdog.c +++ b/kernel/watchdog.c @@ -608,7 +608,6 @@ static inline int watchdog_park_threads( static inline void watchdog_unpark_threads(void) { } static inline int watchdog_enable_all_cpus(void) { return 0; } static inline void watchdog_disable_all_cpus(void) { } -static inline void softlockup_init_threads(void) { } static void softlockup_reconfigure_threads(void) { cpus_read_lock(); @@ -617,6 +616,10 @@ static void softlockup_reconfigure_threa watchdog_nmi_start(); cpus_read_unlock(); } +static inline void softlockup_init_threads(void) +{ + softlockup_reconfigure_threads(); +} #endif /* !CONFIG_SOFTLOCKUP_DETECTOR */ static void __lockup_detector_cleanup(void)
On Tue, 3 Oct 2017, Thomas Gleixner wrote: > On Tue, 3 Oct 2017, Michael Ellerman wrote: > > Hi Thomas, > > Unfortunately this is hitting the WARN_ON in start_wd_cpu() on powerpc > > because we're calling it multiple times for the boot CPU. > > > > The first call is via: > > > > start_wd_on_cpu+0x80/0x2f0 > > watchdog_nmi_reconfigure+0x124/0x170 > > softlockup_reconfigure_threads+0x110/0x130 > > lockup_detector_init+0xbc/0xe0 > > kernel_init_freeable+0x18c/0x37c > > kernel_init+0x2c/0x160 > > ret_from_kernel_thread+0x5c/0xbc > > > > And then again via the CPU hotplug registration: > > > > start_wd_on_cpu+0x80/0x2f0 > > cpuhp_invoke_callback+0x194/0x620 > > cpuhp_thread_fun+0x7c/0x1b0 > > smpboot_thread_fn+0x290/0x2a0 > > kthread+0x168/0x1b0 > > ret_from_kernel_thread+0x5c/0xbc > > > > > > The first call is new because previously watchdog_nmi_reconfigure() > > wasn't called from softlockup_reconfigure_threads(). > > Hmm, don't you have the same problem with CPU hotplug or do you just get > lucky because the hotplug callback in your code is ordered vs. the > softlockup thread hotplug callback in a way that this does not hit? Which leads me to the question why you need the hotplug state at all if the softlockup detector is enabled. Wouldn't it make more sense to only register the state if softlockup detector is turned off in Kconfig and actually move it to the core code? Thanks, tglx
On Tue, 3 Oct 2017 09:04:03 +0200 (CEST) Thomas Gleixner <tglx@linutronix.de> wrote: > On Tue, 3 Oct 2017, Thomas Gleixner wrote: > > On Tue, 3 Oct 2017, Michael Ellerman wrote: > > > Hi Thomas, > > > Unfortunately this is hitting the WARN_ON in start_wd_cpu() on powerpc > > > because we're calling it multiple times for the boot CPU. > > > > > > The first call is via: > > > > > > start_wd_on_cpu+0x80/0x2f0 > > > watchdog_nmi_reconfigure+0x124/0x170 > > > softlockup_reconfigure_threads+0x110/0x130 > > > lockup_detector_init+0xbc/0xe0 > > > kernel_init_freeable+0x18c/0x37c > > > kernel_init+0x2c/0x160 > > > ret_from_kernel_thread+0x5c/0xbc > > > > > > And then again via the CPU hotplug registration: > > > > > > start_wd_on_cpu+0x80/0x2f0 > > > cpuhp_invoke_callback+0x194/0x620 > > > cpuhp_thread_fun+0x7c/0x1b0 > > > smpboot_thread_fn+0x290/0x2a0 > > > kthread+0x168/0x1b0 > > > ret_from_kernel_thread+0x5c/0xbc > > > > > > > > > The first call is new because previously watchdog_nmi_reconfigure() > > > wasn't called from softlockup_reconfigure_threads(). > > > > Hmm, don't you have the same problem with CPU hotplug or do you just get > > lucky because the hotplug callback in your code is ordered vs. the > > softlockup thread hotplug callback in a way that this does not hit? I had the idea that it watchdog_nmi_reconfigure() being only called with get_online_cpus held would prevent hotplug callbacks running. > > Which leads me to the question why you need the hotplug state at all if the > softlockup detector is enabled. Wouldn't it make more sense to only > register the state if softlockup detector is turned off in Kconfig and > actually move it to the core code? I don't understand what you mean exactly, but it was done to avoid relying on the softlockup detector at all, because it wasn't needed for anything else (unlike the perf lockup detector). Thanks, Nick
On Tue, 3 Oct 2017, Nicholas Piggin wrote: > On Tue, 3 Oct 2017 09:04:03 +0200 (CEST) > Thomas Gleixner <tglx@linutronix.de> wrote: > > > On Tue, 3 Oct 2017, Thomas Gleixner wrote: > > > On Tue, 3 Oct 2017, Michael Ellerman wrote: > > > > Hi Thomas, > > > > Unfortunately this is hitting the WARN_ON in start_wd_cpu() on powerpc > > > > because we're calling it multiple times for the boot CPU. > > > > > > > > The first call is via: > > > > > > > > start_wd_on_cpu+0x80/0x2f0 > > > > watchdog_nmi_reconfigure+0x124/0x170 > > > > softlockup_reconfigure_threads+0x110/0x130 > > > > lockup_detector_init+0xbc/0xe0 > > > > kernel_init_freeable+0x18c/0x37c > > > > kernel_init+0x2c/0x160 > > > > ret_from_kernel_thread+0x5c/0xbc > > > > > > > > And then again via the CPU hotplug registration: > > > > > > > > start_wd_on_cpu+0x80/0x2f0 > > > > cpuhp_invoke_callback+0x194/0x620 > > > > cpuhp_thread_fun+0x7c/0x1b0 > > > > smpboot_thread_fn+0x290/0x2a0 > > > > kthread+0x168/0x1b0 > > > > ret_from_kernel_thread+0x5c/0xbc > > > > > > > > > > > > The first call is new because previously watchdog_nmi_reconfigure() > > > > wasn't called from softlockup_reconfigure_threads(). > > > > > > Hmm, don't you have the same problem with CPU hotplug or do you just get > > > lucky because the hotplug callback in your code is ordered vs. the > > > softlockup thread hotplug callback in a way that this does not hit? > > I had the idea that it watchdog_nmi_reconfigure() being only called > with get_online_cpus held would prevent hotplug callbacks running. > > > > > Which leads me to the question why you need the hotplug state at all if the > > softlockup detector is enabled. Wouldn't it make more sense to only > > register the state if softlockup detector is turned off in Kconfig and > > actually move it to the core code? > > I don't understand what you mean exactly, but it was done to avoid > relying on the softlockup detector at all, because it wasn't needed > for anything else (unlike the perf lockup detector). If the softlockup detector is enabled along with your hardlockup detector then the current code in mainline invokes watchdog_nmi_enable(cpu), which is a weak function and as I just noticed not implemented by powerpc. So it's a non issue because it's not implemented. Thanks, tglx
Thomas Gleixner <tglx@linutronix.de> writes: > On Tue, 3 Oct 2017, Michael Ellerman wrote: >> Hi Thomas, >> Unfortunately this is hitting the WARN_ON in start_wd_cpu() on powerpc >> because we're calling it multiple times for the boot CPU. >> >> The first call is via: >> >> start_wd_on_cpu+0x80/0x2f0 >> watchdog_nmi_reconfigure+0x124/0x170 >> softlockup_reconfigure_threads+0x110/0x130 >> lockup_detector_init+0xbc/0xe0 >> kernel_init_freeable+0x18c/0x37c >> kernel_init+0x2c/0x160 >> ret_from_kernel_thread+0x5c/0xbc >> >> And then again via the CPU hotplug registration: >> >> start_wd_on_cpu+0x80/0x2f0 >> cpuhp_invoke_callback+0x194/0x620 >> cpuhp_thread_fun+0x7c/0x1b0 >> smpboot_thread_fn+0x290/0x2a0 >> kthread+0x168/0x1b0 >> ret_from_kernel_thread+0x5c/0xbc >> >> >> The first call is new because previously watchdog_nmi_reconfigure() >> wasn't called from softlockup_reconfigure_threads(). > > Hmm, don't you have the same problem with CPU hotplug or do you just get > lucky because the hotplug callback in your code is ordered vs. the > softlockup thread hotplug callback in a way that this does not hit? I don't see it with CPU hotplug. AFAICS that's because softlockup_reconfigure_threads() isn't called for CPU hotplug. Unless there's a path I'm missing? >> I'm not sure what the easiest fix is. One option would be to just drop >> the WARN_ON, it's just there for paranoia AFAICS. > > The straight forward way is to make use of the new probe function. Patch > below. Thanks. Hmm, I tried that patch, it makes the warning go away. But then I triggered a deliberate hard lockup and got nothing. Then I went back to the existing code (in linux-next), and I still get no warning from a deliberate hard lockup. So seems there may be some more gremlins. Will test more in the morning. cheers
On Tue, 3 Oct 2017, Michael Ellerman wrote: > Thomas Gleixner <tglx@linutronix.de> writes: > >> The first call is new because previously watchdog_nmi_reconfigure() > >> wasn't called from softlockup_reconfigure_threads(). > > > > Hmm, don't you have the same problem with CPU hotplug or do you just get > > lucky because the hotplug callback in your code is ordered vs. the > > softlockup thread hotplug callback in a way that this does not hit? > > I don't see it with CPU hotplug. > > AFAICS that's because softlockup_reconfigure_threads() isn't called for > CPU hotplug. Unless there's a path I'm missing? As I said in the other reply, I assumed that its called via watchdog_nmi_enable(cpu), but that's a weak function which is not implemented on power. So no issue. > >> I'm not sure what the easiest fix is. One option would be to just drop > >> the WARN_ON, it's just there for paranoia AFAICS. > > > > The straight forward way is to make use of the new probe function. Patch > > below. > > Hmm, I tried that patch, it makes the warning go away. But then I > triggered a deliberate hard lockup and got nothing. > > Then I went back to the existing code (in linux-next), and I still get > no warning from a deliberate hard lockup. > > So seems there may be some more gremlins. Will test more in the morning. Hrm. That's weird. I'll have a look and send a proper patch series on top of next. Thanks, tglx
On Tue, 3 Oct 2017, Thomas Gleixner wrote: > On Tue, 3 Oct 2017, Michael Ellerman wrote: > > Hmm, I tried that patch, it makes the warning go away. But then I > > triggered a deliberate hard lockup and got nothing. > > > > Then I went back to the existing code (in linux-next), and I still get > > no warning from a deliberate hard lockup. > > > > So seems there may be some more gremlins. Will test more in the morning. > > Hrm. That's weird. I'll have a look and send a proper patch series on top > of next. The major difference is that the reworked code utilizes watchdog_nmi_reconfigure() for both init and the sysctl updates, but I can't for my life figure out why that doesn't work. Thanks, tglx
On Tue, 3 Oct 2017, Thomas Gleixner wrote: > On Tue, 3 Oct 2017, Thomas Gleixner wrote: > > On Tue, 3 Oct 2017, Michael Ellerman wrote: > > > Hmm, I tried that patch, it makes the warning go away. But then I > > > triggered a deliberate hard lockup and got nothing. > > > > > > Then I went back to the existing code (in linux-next), and I still get > > > no warning from a deliberate hard lockup. > > > > > > So seems there may be some more gremlins. Will test more in the morning. > > > > Hrm. That's weird. I'll have a look and send a proper patch series on top > > of next. > > The major difference is that the reworked code utilizes > watchdog_nmi_reconfigure() for both init and the sysctl updates, but I > can't for my life figure out why that doesn't work. I collected the changes which Linus requested along with the nmi_probe() one and pushed them into: git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git WIP.core/urgent That's based on 4.13 final so it neither contains 4.14 nor -next material. Thanks, tglx
Thomas Gleixner <tglx@linutronix.de> writes: > On Tue, 3 Oct 2017, Thomas Gleixner wrote: >> On Tue, 3 Oct 2017, Thomas Gleixner wrote: >> > On Tue, 3 Oct 2017, Michael Ellerman wrote: >> > > Hmm, I tried that patch, it makes the warning go away. But then I >> > > triggered a deliberate hard lockup and got nothing. >> > > >> > > Then I went back to the existing code (in linux-next), and I still get >> > > no warning from a deliberate hard lockup. >> > > >> > > So seems there may be some more gremlins. Will test more in the morning. >> > >> > Hrm. That's weird. I'll have a look and send a proper patch series on top >> > of next. >> >> The major difference is that the reworked code utilizes >> watchdog_nmi_reconfigure() for both init and the sysctl updates, but I >> can't for my life figure out why that doesn't work. > > I collected the changes which Linus requested along with the nmi_probe() > one and pushed them into: > > git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git WIP.core/urgent > > That's based on 4.13 final so it neither contains 4.14 nor -next material. Thanks. I tested that here and it seems fine. The warning at boot is gone and it is correctly catching a hard lockup triggered via LKDTM, eg: # mount -t debugfs none /sys/kernel/debug # echo HARDLOCKUP > /sys/kernel/debug/provoke-crash/DIRECT lkdtm: Performing direct entry HARDLOCKUP Watchdog CPU:0 Hard LOCKUP Modules linked in: CPU: 0 PID: 1215 Comm: sh Not tainted 4.13.0-gcc6-11846-g86be5ee #162 task: c0000000f1fc4c00 task.stack: c0000000ee3ac000 NIP: c0000000007205a4 LR: c00000000071f950 CTR: c000000000720570 REGS: c00000003ffffd80 TRAP: 0900 Not tainted (4.13.0-gcc6-11846-g86be5ee) MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE> CR: 28002228 XER: 00000000 CFAR: c0000000007205a8 SOFTE: 0 GPR00: c00000000071f950 c0000000ee3afbb0 c00000000107cf00 c0000000010604f0 GPR04: c0000000ffa05d90 c0000000ffa1c968 0000000000000000 0000000000000000 GPR08: 0000000000000007 0000000000000001 0000000000000000 9000000030001003 GPR12: c000000000720570 c00000000fd40000 0000000000000000 0000000000000000 GPR16: 0000000000000000 0000000000000000 0000000000000000 00000000100b8fd0 GPR20: 000001002f5a3485 00000000100b8f90 0000000000000000 0000000000000000 GPR24: c000000001060778 c0000000ee3afe00 c0000000ee3afe00 c0000000010603b0 GPR28: 000000000000000b c0000000f1fe0000 0000000000000140 c0000000010604f0 NIP [c0000000007205a4] lkdtm_HARDLOCKUP+0x34/0x40 LR [c00000000071f950] lkdtm_do_action+0x50/0x70 Call Trace: [c0000000ee3afbb0] [0000000000000140] 0x140 (unreliable) [c0000000ee3afbd0] [c00000000071f950] lkdtm_do_action+0x50/0x70 [c0000000ee3afc00] [c00000000071fdc0] direct_entry+0x110/0x1b0 [c0000000ee3afc90] [c00000000050141c] full_proxy_write+0x9c/0x110 [c0000000ee3afcf0] [c000000000336a3c] __vfs_write+0x6c/0x210 [c0000000ee3afd90] [c000000000338960] vfs_write+0xd0/0x270 [c0000000ee3afde0] [c00000000033a93c] SyS_write+0x6c/0x110 [c0000000ee3afe30] [c00000000000b220] system_call+0x58/0x6c Instruction dump: 3842c990 7c0802a6 f8010010 f821ffe1 60000000 60000000 39400000 892d027a 994d027a 60000000 60420000 7c210b78 <7c421378> 4bfffff8 60420000 3c4c0096 Kernel panic - not syncing: Hard LOCKUP Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc) cheers
On Tue, Oct 03, 2017 at 07:27:01PM +0000, Thomas Gleixner wrote: > On Tue, 3 Oct 2017, Thomas Gleixner wrote: > > On Tue, 3 Oct 2017, Thomas Gleixner wrote: > > > On Tue, 3 Oct 2017, Michael Ellerman wrote: > > > > Hmm, I tried that patch, it makes the warning go away. But then I > > > > triggered a deliberate hard lockup and got nothing. > > > > > > > > Then I went back to the existing code (in linux-next), and I still get > > > > no warning from a deliberate hard lockup. > > > > > > > > So seems there may be some more gremlins. Will test more in the morning. > > > > > > Hrm. That's weird. I'll have a look and send a proper patch series on top > > > of next. > > > > The major difference is that the reworked code utilizes > > watchdog_nmi_reconfigure() for both init and the sysctl updates, but I > > can't for my life figure out why that doesn't work. > > I collected the changes which Linus requested along with the nmi_probe() > one and pushed them into: > > git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git WIP.core/urgent > > That's based on 4.13 final so it neither contains 4.14 nor -next material. Tested your changes on 4.14-rc3 and it passes my tests. Thanks! Tested-and-Reviewed-by: Don Zickus <dzickus@redhat.com>
--- a/arch/powerpc/kernel/watchdog.c +++ b/arch/powerpc/kernel/watchdog.c @@ -355,17 +355,18 @@ static void watchdog_calc_timeouts(void) wd_timer_period_ms = watchdog_thresh * 1000 * 2 / 5; } -void watchdog_nmi_reconfigure(void) +void watchdog_nmi_reconfigure(bool run) { int cpu; - watchdog_calc_timeouts(); - - for_each_cpu(cpu, &wd_cpus_enabled) - stop_wd_on_cpu(cpu); - - for_each_cpu_and(cpu, cpu_online_mask, &watchdog_cpumask) - start_wd_on_cpu(cpu); + if (!run) { + for_each_cpu(cpu, &wd_cpus_enabled) + stop_wd_on_cpu(cpu); + } else { + watchdog_calc_timeouts(); + for_each_cpu_and(cpu, cpu_online_mask, &watchdog_cpumask) + start_wd_on_cpu(cpu); + } } /* --- a/include/linux/nmi.h +++ b/include/linux/nmi.h @@ -103,6 +103,8 @@ static inline void arch_touch_nmi_watchd #endif #endif +void watchdog_nmi_reconfigure(bool run); + /** * touch_nmi_watchdog - restart NMI watchdog timeout. * --- a/kernel/watchdog.c +++ b/kernel/watchdog.c @@ -112,17 +112,25 @@ void __weak watchdog_nmi_disable(unsigne hardlockup_detector_perf_disable(); } -/* - * watchdog_nmi_reconfigure can be implemented to be notified after any - * watchdog configuration change. The arch hardlockup watchdog should - * respond to the following variables: +/** + * watchdog_nmi_reconfigure - Optional function to reconfigure NMI watchdogs + * @run: If false stop the watchdogs on all enabled CPUs + * If true start the watchdogs on all enabled CPUs + * + * The core call order is: + * watchdog_nmi_reconfigure(false); + * update_variables(); + * watchdog_nmi_reconfigure(true); + * + * The second call which starts the watchdogs again guarantees that the + * following variables are stable across the call. * - watchdog_enabled * - watchdog_thresh * - watchdog_cpumask - * - sysctl_hardlockup_all_cpu_backtrace - * - hardlockup_panic + * + * After the call the variables can be changed again. */ -void __weak watchdog_nmi_reconfigure(void) { } +void __weak watchdog_nmi_reconfigure(bool run) { } #ifdef CONFIG_SOFTLOCKUP_DETECTOR @@ -515,10 +523,12 @@ static void softlockup_unpark_threads(vo static void softlockup_reconfigure_threads(bool enabled) { + watchdog_nmi_reconfigure(false); softlockup_park_all_threads(); set_sample_period(); if (enabled) softlockup_unpark_threads(); + watchdog_nmi_reconfigure(true); } /* @@ -559,7 +569,11 @@ static inline void watchdog_unpark_threa static inline int watchdog_enable_all_cpus(void) { return 0; } static inline void watchdog_disable_all_cpus(void) { } static inline void softlockup_init_threads(void) { } -static inline void softlockup_reconfigure_threads(bool enabled) { } +static void softlockup_reconfigure_threads(bool enabled) +{ + watchdog_nmi_reconfigure(false); + watchdog_nmi_reconfigure(true); +} #endif /* !CONFIG_SOFTLOCKUP_DETECTOR */ static void __lockup_detector_cleanup(void) @@ -599,7 +613,6 @@ static void proc_watchdog_update(void) /* Remove impossible cpus to keep sysctl output clean. */ cpumask_and(&watchdog_cpumask, &watchdog_cpumask, cpu_possible_mask); softlockup_reconfigure_threads(watchdog_enabled && watchdog_thresh); - watchdog_nmi_reconfigure(); } /*
Both the perf reconfiguration and the powerpc watchdog_nmi_reconfigure() need to be done in two steps. 1) Stop all NMIs 2) Read the new parameters and start NMIs Right now watchdog_nmi_reconfigure() is a combination of both. To allow a clean reconfiguration add a 'run' argument and split the functionality in powerpc. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Don Zickus <dzickus@redhat.com> Cc: Chris Metcalf <cmetcalf@mellanox.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Sebastian Siewior <bigeasy@linutronix.de> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Ulrich Obergfell <uobergfe@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: linuxppc-dev@lists.ozlabs.org Link: http://lkml.kernel.org/r/20170831073054.740462115@linutronix.de --- arch/powerpc/kernel/watchdog.c | 17 +++++++++-------- include/linux/nmi.h | 2 ++ kernel/watchdog.c | 31 ++++++++++++++++++++++--------- 3 files changed, 33 insertions(+), 17 deletions(-)