mbox series

[0/9] ipmi-watchdog: Fixes for error handling and general cleanups

Message ID 20180524001335.15457-1-wak@google.com
Headers show
Series ipmi-watchdog: Fixes for error handling and general cleanups | expand

Message

William Kennington May 24, 2018, 12:13 a.m. UTC
The current watchdog implementation doesn't do a good job of handling
error cases when the BMC side restarts or crashes. This set of patches
tries to improve the robustness of the ipmi watchdog code such that
recovery happens when possible.

This series also adds a patch which enables the watchdog for the KEXEC
payload since the kernel being executed is guaranteed to support handling
the watchdog.

It also adds some general cleanups to the code that made the above
easier to implement.

William A. Kennington III (9):
  ipmi-watchdog: WD_POWER_CYCLE_ACTION -> WD_RESET_ACTION
  ipmi-watchdog: Make it possible to set DONT_STOP
  ipmi-watchdog: Don't reset the watchdog twice
  ipmi-watchdog: Don't disable at shutdown
  ipmi-watchdog: Add a flag to determine if we are still ticking
  ipmi-watchdog: The stop action should disable reset
  ipmi-watchdog: Simplify our completion function
  ipmi-watchdog: Support resetting the watchdog after set
  ipmi-watchdog: Support handling re-initialization

 hw/ipmi/ipmi-watchdog.c | 126 +++++++++++++++++++++++++++++++---------
 1 file changed, 98 insertions(+), 28 deletions(-)

Comments

Stewart Smith June 5, 2018, 4:23 a.m. UTC | #1
"William A. Kennington III" <wak@google.com> writes:
> The current watchdog implementation doesn't do a good job of handling
> error cases when the BMC side restarts or crashes. This set of patches
> tries to improve the robustness of the ipmi watchdog code such that
> recovery happens when possible.
>
> This series also adds a patch which enables the watchdog for the KEXEC
> payload since the kernel being executed is guaranteed to support handling
> the watchdog.
>
> It also adds some general cleanups to the code that made the above
> easier to implement.
>
> William A. Kennington III (9):
>   ipmi-watchdog: WD_POWER_CYCLE_ACTION -> WD_RESET_ACTION
>   ipmi-watchdog: Make it possible to set DONT_STOP
>   ipmi-watchdog: Don't reset the watchdog twice

I kind of meant to try this on a p8 with dodgy BMC before I hit merge,
but I didn't. So, I'll try now and see how it goes.

>   ipmi-watchdog: Don't disable at shutdown

I've added in 5b70462c73a803d15982fe6f2c6dad89b8a9c962 to not run with
it enabled as we exit by default, as I just want to give some time for
people to catch up with their BOOTKERNELs.

I'm not sure when is teh best time to revert my patch and force the
issue though... maybe in an op-build cycle or two?

>   ipmi-watchdog: Add a flag to determine if we are still ticking
>   ipmi-watchdog: The stop action should disable reset
>   ipmi-watchdog: Simplify our completion function
>   ipmi-watchdog: Support resetting the watchdog after set
>   ipmi-watchdog: Support handling re-initialization

Anyway, series merged to master as of e6e74c53ed64eb029cb669fbb6715ee4077cf0b2
William Kennington June 5, 2018, 4:56 a.m. UTC | #2
Thanks, let me know how the testing goes. I'll add a reminder for
myself to take a look at reverting
5b70462c73a803d15982fe6f2c6dad89b8a9c962 in a couple months.
On Mon, Jun 4, 2018 at 9:23 PM Stewart Smith <stewart@linux.ibm.com> wrote:
>
> "William A. Kennington III" <wak@google.com> writes:
> > The current watchdog implementation doesn't do a good job of handling
> > error cases when the BMC side restarts or crashes. This set of patches
> > tries to improve the robustness of the ipmi watchdog code such that
> > recovery happens when possible.
> >
> > This series also adds a patch which enables the watchdog for the KEXEC
> > payload since the kernel being executed is guaranteed to support handling
> > the watchdog.
> >
> > It also adds some general cleanups to the code that made the above
> > easier to implement.
> >
> > William A. Kennington III (9):
> >   ipmi-watchdog: WD_POWER_CYCLE_ACTION -> WD_RESET_ACTION
> >   ipmi-watchdog: Make it possible to set DONT_STOP
> >   ipmi-watchdog: Don't reset the watchdog twice
>
> I kind of meant to try this on a p8 with dodgy BMC before I hit merge,
> but I didn't. So, I'll try now and see how it goes.
>
> >   ipmi-watchdog: Don't disable at shutdown
>
> I've added in 5b70462c73a803d15982fe6f2c6dad89b8a9c962 to not run with
> it enabled as we exit by default, as I just want to give some time for
> people to catch up with their BOOTKERNELs.
>
> I'm not sure when is teh best time to revert my patch and force the
> issue though... maybe in an op-build cycle or two?
>
> >   ipmi-watchdog: Add a flag to determine if we are still ticking
> >   ipmi-watchdog: The stop action should disable reset
> >   ipmi-watchdog: Simplify our completion function
> >   ipmi-watchdog: Support resetting the watchdog after set
> >   ipmi-watchdog: Support handling re-initialization
>
> Anyway, series merged to master as of e6e74c53ed64eb029cb669fbb6715ee4077cf0b2
>
> --
> Stewart Smith
> OPAL Architect, IBM.
>