watchdog: core: suppress "watchdog did not stop" message

Message ID 20181115234413.27009-1-taoren@fb.com
State New
Headers show
Series
  • watchdog: core: suppress "watchdog did not stop" message
Related show

Commit Message

Tao Ren Nov. 15, 2018, 11:44 p.m.
Currently "watchdog did not stop!" message is printed when the watchdog
timer is not stopped at close. For example, people may see the message
when rebooting the system, or the message will be logged to console
periodically if watchdog is kicked by a scirpt which runs "echo k >
/dev/watchdog" command.

Given a critical message usually indicates a serious hardware/software
failure, this message could easily lead to confusion, so it's better to
just delete the message.

Signed-off-by: Tao Ren <taoren@fb.com>
---
 drivers/watchdog/watchdog_dev.c | 1 -
 1 file changed, 1 deletion(-)

Comments

Guenter Roeck Nov. 16, 2018, 12:19 a.m. | #1
On Thu, Nov 15, 2018 at 11:44:26PM +0000, Tao Ren wrote:
> Currently "watchdog did not stop!" message is printed when the watchdog
> timer is not stopped at close. For example, people may see the message
> when rebooting the system, or the message will be logged to console
> periodically if watchdog is kicked by a scirpt which runs "echo k >
> /dev/watchdog" command.
> 
> Given a critical message usually indicates a serious hardware/software
> failure, this message could easily lead to confusion, so it's better to
> just delete the message.
> 
> Signed-off-by: Tao Ren <taoren@fb.com>

NACK. This message is displayed if/when the watchdog application
exits without stopping the watchdog and/or without closing properly.
This _is_ critical since it will reboot the system after the next
timeout period.

If userspace triggers this message on purpose (eg by the mentioned
script, which does not exit properly), userspace is at fault,
not the kernel.

Guenter

> ---
>  drivers/watchdog/watchdog_dev.c | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/drivers/watchdog/watchdog_dev.c b/drivers/watchdog/watchdog_dev.c
> index f6c24b22b37c..65e9ebbd8759 100644
> --- a/drivers/watchdog/watchdog_dev.c
> +++ b/drivers/watchdog/watchdog_dev.c
> @@ -879,7 +879,6 @@ static int watchdog_release(struct inode *inode, struct file *file)
>  
>  	/* If the watchdog was not stopped, send a keepalive ping */
>  	if (err < 0) {
> -		pr_crit("watchdog%d: watchdog did not stop!\n", wdd->id);
>  		watchdog_ping(wdd);
>  	}
>  
> -- 
> 2.17.1
>
Tao Ren Nov. 16, 2018, 12:37 a.m. | #2
On 11/15/18 4:19 PM, Guenter Roeck wrote:
> NACK. This message is displayed if/when the watchdog application
> exits without stopping the watchdog and/or without closing properly.
> This _is_ critical since it will reboot the system after the next
> timeout period.
> 
> If userspace triggers this message on purpose (eg by the mentioned
> script, which does not exit properly), userspace is at fault,
> not the kernel.
> 
> Guenter

Thank you for the quick response, Guenter. I see the log each time when I reboot my system, and when I searched the message in google, I also found posts asking why the message is printed at reboot, and that's why I feel it's confusing.

Anyways, please ignore the patch since it's necessary.

Thanks,
Tao Ren
Jerry Hoemann Nov. 27, 2018, 1:31 a.m. | #3
On Fri, Nov 16, 2018 at 12:37:28AM +0000, Tao Ren wrote:
> On 11/15/18 4:19 PM, Guenter Roeck wrote:
> > NACK. This message is displayed if/when the watchdog application
> > exits without stopping the watchdog and/or without closing properly.
> > This _is_ critical since it will reboot the system after the next
> > timeout period.
> > 
> > If userspace triggers this message on purpose (eg by the mentioned
> > script, which does not exit properly), userspace is at fault,
> > not the kernel.
> > 
> > Guenter
> 
> Thank you for the quick response, Guenter. I see the log each time when I reboot my system, and when I searched the message in google, I also found posts asking why the message is printed at reboot, and that's why I feel it's confusing.
> 
> Anyways, please ignore the patch since it's necessary.

Tao,

If you're on a system running systemd, the default behavior is to
enable the watchdog during shutdown.  This guards against shutdown hanging.
So this message will be routinely printed out during orderly shutdown.


See file: /etc/systemd/system.conf
----------------------------------
...
# Entries in this file show the compile time defaults.
...

#ShutdownWatchdogSec=10min
Tao Ren Nov. 27, 2018, 6:11 a.m. | #4
On 11/26/18, 5:31 PM, "Jerry Hoemann" <jerry.hoemann@hpe.com> wrote:
> Tao,
> 
> If you're on a system running systemd, the default behavior is to
> enable the watchdog during shutdown.  This guards against shutdown hanging.
> So this message will be routinely printed out during orderly shutdown.

Thank you Jerry for the comments.

I actually use a separate daemon process to kick the watchdog on my BMC system. The daemon monitors temperature sensors and other system states and kicks watchdog periodically: if the daemon gets stuck or exits, then the machine needs to reboot even if kernel/systemd is fine. Perhaps I need to look for a better/official way to manage the watchdog device..

BTW, I will be travelling abroad in the new few days and may not be able to reply emails timely. Thank you again for jumping in.

Best regards,
Tao Ren

Patch

diff --git a/drivers/watchdog/watchdog_dev.c b/drivers/watchdog/watchdog_dev.c
index f6c24b22b37c..65e9ebbd8759 100644
--- a/drivers/watchdog/watchdog_dev.c
+++ b/drivers/watchdog/watchdog_dev.c
@@ -879,7 +879,6 @@  static int watchdog_release(struct inode *inode, struct file *file)
 
 	/* If the watchdog was not stopped, send a keepalive ping */
 	if (err < 0) {
-		pr_crit("watchdog%d: watchdog did not stop!\n", wdd->id);
 		watchdog_ping(wdd);
 	}