diff mbox series

powerpc, wdt: disable ratelimiting when disabling interrupts

Message ID 20210409141226.2093980-1-rasmus.villemoes@prevas.dk
State Superseded
Delegated to: Priyanka Jain
Headers show
Series powerpc, wdt: disable ratelimiting when disabling interrupts | expand

Commit Message

Rasmus Villemoes April 9, 2021, 2:12 p.m. UTC
On powerpc, time as measured by get_timer() ceases to pass when
interrupts are disabled (since on powerpc get_timer() returns the
value of a volatile variable that gets updated via a timer
interrupt). That in turn means the watchdog_reset() function provided
by CONFIG_WDT ceases to work due to the ratelimiting it imposes.

Normally, interrupts are just disabled very briefly. However, during
bootm, they are disabled for good prior to decompressing the kernel
image, which can be a somewhat time-consuming operation. Even when we
manage to decompress the kernel and do the other preparation steps and
hand over control to the kernel, the kernel also takes some time
before it is ready to assume responsibility for handling the
watchdog. The end result is that the board gets reset prematurely.

The ratelimiting isn't really strictly needed (prior to DM WDT, no
such thing existed), so just disable it when we know that time no
longer passes and have watchdog_reset() (e.g. called from
decompression loop) unconditionally reset the watchdog timer.

Signed-off-by: Rasmus Villemoes <rasmus.villemoes@prevas.dk>
---

I previously sent a patch to change the ratelimiting to be based on
get_ticks() instead of get_timer(), but that has gone nowhere
[1]. This is an alternative which only affects powerpc (and only
boards that have enabled CONFIG_WDT). I hope the watchdog maintainers
will accept at least one of these, or suggest a third alternative, so
I don't have to keep some out-of-tree patch applied without knowing if
that's the direction upstream will take.

[1] https://patchwork.ozlabs.org/project/uboot/patch/20200605111657.28773-1-rasmus.villemoes@prevas.dk/



 arch/powerpc/lib/interrupts.c | 3 +++
 drivers/watchdog/wdt-uclass.c | 8 +++++++-
 include/wdt.h                 | 6 ++++++
 3 files changed, 16 insertions(+), 1 deletion(-)

Comments

Christophe Leroy April 9, 2021, 2:37 p.m. UTC | #1
Le 09/04/2021 à 16:12, Rasmus Villemoes a écrit :
> On powerpc, time as measured by get_timer() ceases to pass when
> interrupts are disabled (since on powerpc get_timer() returns the
> value of a volatile variable that gets updated via a timer
> interrupt). That in turn means the watchdog_reset() function provided
> by CONFIG_WDT ceases to work due to the ratelimiting it imposes.
> 
> Normally, interrupts are just disabled very briefly. However, during
> bootm, they are disabled for good prior to decompressing the kernel
> image, which can be a somewhat time-consuming operation. Even when we
> manage to decompress the kernel and do the other preparation steps and
> hand over control to the kernel, the kernel also takes some time
> before it is ready to assume responsibility for handling the
> watchdog. The end result is that the board gets reset prematurely.
> 
> The ratelimiting isn't really strictly needed (prior to DM WDT, no
> such thing existed), so just disable it when we know that time no
> longer passes and have watchdog_reset() (e.g. called from
> decompression loop) unconditionally reset the watchdog timer.


Do we need to make it that complicated ? I think before the generic implementation, powerpc didn't 
have a rate limitation at all for pinging the watchdog, why not go back this direction, all the time ?

I mean we could simply set reset_period to 0 at all time for powerpc ( and change the test to 
time_after_eq() instead of time_after() ).


> 
> Signed-off-by: Rasmus Villemoes <rasmus.villemoes@prevas.dk>
> ---
> 
> I previously sent a patch to change the ratelimiting to be based on
> get_ticks() instead of get_timer(), but that has gone nowhere
> [1]. This is an alternative which only affects powerpc (and only
> boards that have enabled CONFIG_WDT). I hope the watchdog maintainers
> will accept at least one of these, or suggest a third alternative, so
> I don't have to keep some out-of-tree patch applied without knowing if
> that's the direction upstream will take.
> 
> [1] https://patchwork.ozlabs.org/project/uboot/patch/20200605111657.28773-1-rasmus.villemoes@prevas.dk/
> 
>
Rasmus Villemoes April 12, 2021, 7:04 a.m. UTC | #2
On 09/04/2021 16.37, Christophe Leroy wrote:
> 
> 
> Le 09/04/2021 à 16:12, Rasmus Villemoes a écrit :

>> The ratelimiting isn't really strictly needed (prior to DM WDT, no
>> such thing existed), so just disable it when we know that time no
>> longer passes and have watchdog_reset() (e.g. called from
>> decompression loop) unconditionally reset the watchdog timer.
> 
> 
> Do we need to make it that complicated ? I think before the generic
> implementation, powerpc didn't have a rate limitation at all for pinging
> the watchdog, why not go back this direction, all the time ?
> 
> I mean we could simply set reset_period to 0 at all time for powerpc (
> and change the test to time_after_eq() instead of time_after() ).

Maybe. I think I'd still prefer the one that changes the rate-limiting
to be based on get_ticks() instead of get_timer(), which did seem to
work everywhere.

IIRC, Stefan previously rejected having a config knob or board hook for
disabling the ratelimiting - changing the code [the s/after/after_eq/]
to effectively allow that by passing hw_margin_ms=0 in DT would probably
work, but seems quite hacky (especially when DT is supposed to describe
hardware).

I've also floated the option of making get_timer() return something
sensible even with interrupts off on ppc - that would also solve the
problem, and would benefit other generic code that uses get_timer()
without knowing its limitation on ppc.
https://lists.denx.de/pipermail/u-boot/2020-June/414842.html .

So, there are lots of ways of solving it. Which direction to take really
depends on which viewpoint one has:

(1) the watchdog code should, if rate-limiting at all, use an
implementation that works on all platforms [the get_ticks option]
(2) the watchdog code should, if rate-limiting at all, provide a way for
a board/platform to override or disable that
(3) get_timer() on ppc is broken, it needs to provide sensible results
at all times, even when interrupts are disabled for tens or hundreds of
milliseconds.

Rasmus
Stefan Roese April 12, 2021, 8:57 a.m. UTC | #3
On 12.04.21 09:04, Rasmus Villemoes wrote:
> On 09/04/2021 16.37, Christophe Leroy wrote:
>>
>>
>> Le 09/04/2021 à 16:12, Rasmus Villemoes a écrit :
> 
>>> The ratelimiting isn't really strictly needed (prior to DM WDT, no
>>> such thing existed), so just disable it when we know that time no
>>> longer passes and have watchdog_reset() (e.g. called from
>>> decompression loop) unconditionally reset the watchdog timer.
>>
>>
>> Do we need to make it that complicated ? I think before the generic
>> implementation, powerpc didn't have a rate limitation at all for pinging
>> the watchdog, why not go back this direction, all the time ?
>>
>> I mean we could simply set reset_period to 0 at all time for powerpc (
>> and change the test to time_after_eq() instead of time_after() ).
> 
> Maybe. I think I'd still prefer the one that changes the rate-limiting
> to be based on get_ticks() instead of get_timer(), which did seem to
> work everywhere.
> 
> IIRC, Stefan previously rejected having a config knob or board hook for
> disabling the ratelimiting - changing the code [the s/after/after_eq/]
> to effectively allow that by passing hw_margin_ms=0 in DT would probably
> work, but seems quite hacky (especially when DT is supposed to describe
> hardware).

I'm not sure if I explicitly rejected any patches, sorry it's been too
long. ;)

But I also find this idea from Christophe quite applealing, as it's
impact on the common code is minimal. Either you set the DT property
to 0. Or if you don't have the possibility to do this (no DT on your
platform?), then add a new Kconfig option to configure the default
value for "reset_period" via Kconfig.

> I've also floated the option of making get_timer() return something
> sensible even with interrupts off on ppc - that would also solve the
> problem, and would benefit other generic code that uses get_timer()
> without knowing its limitation on ppc.
> https://lists.denx.de/pipermail/u-boot/2020-June/414842.html .
> 
> So, there are lots of ways of solving it. Which direction to take really
> depends on which viewpoint one has:
> 
> (1) the watchdog code should, if rate-limiting at all, use an
> implementation that works on all platforms [the get_ticks option]
> (2) the watchdog code should, if rate-limiting at all, provide a way for
> a board/platform to override or disable that
> (3) get_timer() on ppc is broken, it needs to provide sensible results
> at all times, even when interrupts are disabled for tens or hundreds of
> milliseconds.

"hw_margin_ms=0" would be my favorite way right now (see above). If this
does not work for you, I'm also fine with the latest patch you sent to
the list.

Thanks,
Stefan
diff mbox series

Patch

diff --git a/arch/powerpc/lib/interrupts.c b/arch/powerpc/lib/interrupts.c
index 73f270002c..5c5b5fd7ff 100644
--- a/arch/powerpc/lib/interrupts.c
+++ b/arch/powerpc/lib/interrupts.c
@@ -11,6 +11,7 @@ 
 #include <irq_func.h>
 #include <asm/processor.h>
 #include <watchdog.h>
+#include <wdt.h>
 #ifdef CONFIG_LED_STATUS
 #include <status_led.h>
 #endif
@@ -43,6 +44,7 @@  static __inline__ void set_dec (unsigned long val)
 void enable_interrupts(void)
 {
 	set_msr (get_msr () | MSR_EE);
+	watchdog_ratelimit(1);
 }
 
 /* returns flag if MSR_EE was set before */
@@ -50,6 +52,7 @@  int disable_interrupts(void)
 {
 	ulong msr = get_msr ();
 
+	watchdog_ratelimit(0);
 	set_msr (msr & ~MSR_EE);
 	return ((msr & MSR_EE) != 0);
 }
diff --git a/drivers/watchdog/wdt-uclass.c b/drivers/watchdog/wdt-uclass.c
index 0603ffbd36..b70a9d50b8 100644
--- a/drivers/watchdog/wdt-uclass.c
+++ b/drivers/watchdog/wdt-uclass.c
@@ -131,6 +131,12 @@  int wdt_expire_now(struct udevice *dev, ulong flags)
 	return ret;
 }
 
+static int ratelimit = 1;
+void watchdog_ratelimit(int on)
+{
+	ratelimit = on;
+}
+
 #if defined(CONFIG_WATCHDOG)
 /*
  * Called by macro WATCHDOG_RESET. This function be called *very* early,
@@ -148,7 +154,7 @@  void watchdog_reset(void)
 
 	/* Do not reset the watchdog too often */
 	now = get_timer(0);
-	if (time_after(now, next_reset)) {
+	if (!ratelimit || time_after(now, next_reset)) {
 		next_reset = now + reset_period;
 		wdt_reset(gd->watchdog_dev);
 	}
diff --git a/include/wdt.h b/include/wdt.h
index bc242c2eb2..9ba1e62dcf 100644
--- a/include/wdt.h
+++ b/include/wdt.h
@@ -107,4 +107,10 @@  struct wdt_ops {
 
 int initr_watchdog(void);
 
+#if CONFIG_IS_ENABLED(WDT)
+void watchdog_ratelimit(int on);
+#else
+static inline void watchdog_ratelimit(int on) { }
+#endif
+
 #endif  /* _WDT_H_ */