diff mbox series

[2/7] watchdog/hardlockup: Make the config checks more straightforward

Message ID 20230607152432.5435-3-pmladek@suse.com (mailing list archive)
State Handled Elsewhere
Headers show
Series watchdog/hardlockup: Cleanup configuration of hardlockup detectors | expand

Commit Message

Petr Mladek June 7, 2023, 3:24 p.m. UTC
There are four possible variants of hardlockup detectors:

  + buddy: available when SMP is set.

  + perf: available when HAVE_HARDLOCKUP_DETECTOR_PERF is set.

  + arch-specific: available when HAVE_HARDLOCKUP_DETECTOR_ARCH is set.

  + sparc64 special variant: available when HAVE_NMI_WATCHDOG is set
	and HAVE_HARDLOCKUP_DETECTOR_ARCH is not set.

The check for the sparc64 variant is more complicated because
HAVE_NMI_WATCHDOG is used to #ifdef code used by both arch-specific
and sparc64 specific variant. Therefore it is automatically
selected with HAVE_HARDLOCKUP_DETECTOR_ARCH.

This complexity is partly hidden in HAVE_HARDLOCKUP_DETECTOR_NON_ARCH.
It reduces the size of some checks but it makes them harder to follow.

Finally, the other temporary variable HARDLOCKUP_DETECTOR_NON_ARCH
is used to re-compute HARDLOCKUP_DETECTOR_PERF/BUDDY when the global
HARDLOCKUP_DETECTOR switch is enabled/disabled.

Make the logic more straightforward by the following changes:

  + Better explain the role of HAVE_HARDLOCKUP_DETECTOR_ARCH and
    HAVE_NMI_WATCHDOG in comments.

  + Add HAVE_HARDLOCKUP_DETECTOR_BUDDY so that there is separate
    HAVE_* for all four hardlockup detector variants.

    Use it in the other conditions instead of SMP. It makes it
    clear that it is about the buddy detector.

  + Open code HAVE_HARDLOCKUP_DETECTOR_NON_ARCH in HARDLOCKUP_DETECTOR
    and HARDLOCKUP_DETECTOR_PREFER_BUDDY. It helps to understand
    the conditions between the four hardlockup detector variants.

  + Define the exact conditions when HARDLOCKUP_DETECTOR_PERF/BUDDY
    can be enabled. It explains the dependency on the other
    hardlockup detector variants.

    Also it allows to remove HARDLOCKUP_DETECTOR_NON_ARCH by using "imply".
    It triggers re-evaluating HARDLOCKUP_DETECTOR_PERF/BUDDY when
    the global HARDLOCKUP_DETECTOR switch is changed.

  + Add dependency on HARDLOCKUP_DETECTOR so that the affected variables
    disappear when the hardlockup detectors are disabled.

    Another nice side effect is that HARDLOCKUP_DETECTOR_PREFER_BUDDY
    value is not preserved when the global switch is disabled.
    The user has to make the decision when it is enabled again.

Signed-off-by: Petr Mladek <pmladek@suse.com>
---
 arch/Kconfig      | 22 ++++++++++++-----
 lib/Kconfig.debug | 63 ++++++++++++++++++++++++++++-------------------
 2 files changed, 53 insertions(+), 32 deletions(-)

Comments

Doug Anderson June 7, 2023, 11:35 p.m. UTC | #1
Hi,

On Wed, Jun 7, 2023 at 8:25 AM Petr Mladek <pmladek@suse.com> wrote:
>
> diff --git a/arch/Kconfig b/arch/Kconfig
> index 422f0ffa269e..13c6e596cf9e 100644
> --- a/arch/Kconfig
> +++ b/arch/Kconfig
> @@ -404,17 +404,27 @@ config HAVE_NMI_WATCHDOG
>         depends on HAVE_NMI
>         bool
>         help
> -         The arch provides a low level NMI watchdog. It provides
> -         asm/nmi.h, and defines its own watchdog_hardlockup_probe() and
> -         arch_touch_nmi_watchdog().
> +         The arch provides its own hardlockup detector implementation instead
> +         of the generic perf one.

nit: did you mean to have different wording here compared to
HAVE_HARDLOCKUP_DETECTOR_ARCH? Here you say "the generic perf one" and
there you say "the generic ones", though it seems like you mean them
to be the same.

> +
> +         Sparc64 defines this variable without HAVE_HARDLOCKUP_DETECTOR_ARCH.
> +         It does _not_ use the command line parameters and sysctl interface
> +         used by generic hardlockup detectors. Instead it is enabled/disabled
> +         by the top-level watchdog interface that is common for both softlockup
> +         and hardlockup detectors.
>
>  config HAVE_HARDLOCKUP_DETECTOR_ARCH
>         bool
>         select HAVE_NMI_WATCHDOG
>         help
> -         The arch chooses to provide its own hardlockup detector, which is
> -         a superset of the HAVE_NMI_WATCHDOG. It also conforms to config
> -         interfaces and parameters provided by hardlockup detector subsystem.
> +         The arch provides its own hardlockup detector implementation instead
> +         of the generic ones.
> +
> +         It uses the same command line parameters, and sysctl interface,
> +         as the generic hardlockup detectors.
> +
> +         HAVE_NMI_WATCHDOG is selected to build the code shared with
> +         the sparc64 specific implementation.
>
>  config HAVE_PERF_REGS
>         bool
> diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> index 3e91fa33c7a0..d201f5d3876b 100644
> --- a/lib/Kconfig.debug
> +++ b/lib/Kconfig.debug
> @@ -1035,16 +1035,33 @@ config BOOTPARAM_SOFTLOCKUP_PANIC
>
>           Say N if unsure.
>
> +config HAVE_HARDLOCKUP_DETECTOR_BUDDY
> +       bool
> +       depends on SMP
> +       default y

I think you simplify your life if you also add:

  depends on !HAVE_NMI_WATCHDOG

The existing config system has always assumed that no architecture
defines both HAVE_HARDLOCKUP_DETECTOR_PERF and HAVE_NMI_WATCHDOG
(symbols would have clashed and you would get a link error as two
watchdogs try to implement the same weak symbol). If you add the extra
dependency to "buddy" as per above, then a few things below fall out.
I'll try to point them out below.


> +
>  #
> -# arch/ can define HAVE_HARDLOCKUP_DETECTOR_ARCH to provide their own hard
> -# lockup detector rather than the perf based detector.
> +# Global switch whether to build a hardlockup detector at all. It is available
> +# only when the architecture supports at least one implementation. There are
> +# two exceptions. The hardlockup detector is newer enabled on:

s/newer/never/


> +#
> +#      s390: it reported many false positives there
> +#
> +#      sparc64: has a custom implementation which is not using the common
> +#              hardlockup command line options and sysctl interface.
> +#
> +# Note that HAVE_NMI_WATCHDOG is used to distinguish the sparc64 specific
> +# implementaion. It is automatically enabled also for other arch-specific
> +# variants which set HAVE_HARDLOCKUP_DETECTOR_ARCH. It makes the check
> +# of avaialable and supported variants quite tricky.
>  #
>  config HARDLOCKUP_DETECTOR
>         bool "Detect Hard Lockups"
>         depends on DEBUG_KERNEL && !S390
> -       depends on HAVE_HARDLOCKUP_DETECTOR_NON_ARCH || HAVE_HARDLOCKUP_DETECTOR_ARCH
> +       depends on ((HAVE_HARDLOCKUP_DETECTOR_PERF || HAVE_HARDLOCKUP_DETECTOR_BUDDY) && !HAVE_NMI_WATCHDOG) || HAVE_HARDLOCKUP_DETECTOR_ARCH

Adding the dependency to buddy (see ablove) would simplify the above
to just this:

depends on HAVE_HARDLOCKUP_DETECTOR_PERF ||
HAVE_HARDLOCKUP_DETECTOR_BUDDY || HAVE_HARDLOCKUP_DETECTOR_ARCH

As per above, it's simply a responsibility of architectures not to
define that they have both "perf" if they have the NMI watchdog, so
it's just buddy to worry about.


> +       imply HARDLOCKUP_DETECTOR_PERF
> +       imply HARDLOCKUP_DETECTOR_BUDDY
>         select LOCKUP_DETECTOR
> -       select HARDLOCKUP_DETECTOR_NON_ARCH if HAVE_HARDLOCKUP_DETECTOR_NON_ARCH
>
>         help
>           Say Y here to enable the kernel to act as a watchdog to detect
> @@ -1055,9 +1072,15 @@ config HARDLOCKUP_DETECTOR
>           chance to run.  The current stack trace is displayed upon detection
>           and the system will stay locked up.
>
> +#
> +# Note that arch-specific variants are always preferred.
> +#
>  config HARDLOCKUP_DETECTOR_PREFER_BUDDY
>         bool "Prefer the buddy CPU hardlockup detector"
> -       depends on HAVE_HARDLOCKUP_DETECTOR_PERF && SMP
> +       depends on HARDLOCKUP_DETECTOR
> +       depends on HAVE_HARDLOCKUP_DETECTOR_PERF && HAVE_HARDLOCKUP_DETECTOR_BUDDY
> +       depends on !HAVE_NMI_WATCHDOG

Can get rid of above "!HAVE_NMI_WATCHDOG" if it's added to
HAVE_HARDLOCKUP_DETECTOR_BUDDY.


> +       default n

I'm pretty sure "default n" isn't needed.


>         help
>           Say Y here to prefer the buddy hardlockup detector over the perf one.
>
> @@ -1071,39 +1094,27 @@ config HARDLOCKUP_DETECTOR_PREFER_BUDDY
>
>  config HARDLOCKUP_DETECTOR_PERF
>         bool
> -       depends on HAVE_HARDLOCKUP_DETECTOR_PERF
> +       depends on HARDLOCKUP_DETECTOR
> +       depends on HAVE_HARDLOCKUP_DETECTOR_PERF && !HARDLOCKUP_DETECTOR_PREFER_BUDDY
> +       depends on !HAVE_NMI_WATCHDOG

We don't really need the "!HAVE_NMI_WATCHDOG". A user wouldn't be able
to mess this up and it's up to an architecture not to define both
HAVE_HARDLOCKUP_DETECTOR_PERF and HAVE_NMI_WATCHDOG.


>         select HARDLOCKUP_DETECTOR_COUNTS_HRTIMER
>
>  config HARDLOCKUP_DETECTOR_BUDDY
>         bool
> -       depends on SMP
> +       depends on HARDLOCKUP_DETECTOR
> +       depends on HAVE_HARDLOCKUP_DETECTOR_BUDDY
> +       depends on !HAVE_HARDLOCKUP_DETECTOR_PERF || HARDLOCKUP_DETECTOR_PREFER_BUDDY
> +       depends on !HAVE_NMI_WATCHDOG

Get rid of "!HAVE_NMI_WATCHDOG" and it should be in
HAVE_HARDLOCKUP_DETECTOR_BUDDY


Overall, though, I agree that this improves things! Thanks! :-)
Petr Mladek June 8, 2023, 11:02 a.m. UTC | #2
On Wed 2023-06-07 16:35:09, Doug Anderson wrote:
> Hi,
> 
> On Wed, Jun 7, 2023 at 8:25 AM Petr Mladek <pmladek@suse.com> wrote:
> >
> > diff --git a/arch/Kconfig b/arch/Kconfig
> > index 422f0ffa269e..13c6e596cf9e 100644
> > --- a/arch/Kconfig
> > +++ b/arch/Kconfig
> > @@ -404,17 +404,27 @@ config HAVE_NMI_WATCHDOG
> >         depends on HAVE_NMI
> >         bool
> >         help
> > -         The arch provides a low level NMI watchdog. It provides
> > -         asm/nmi.h, and defines its own watchdog_hardlockup_probe() and
> > -         arch_touch_nmi_watchdog().
> > +         The arch provides its own hardlockup detector implementation instead
> > +         of the generic perf one.
> 
> nit: did you mean to have different wording here compared to
> HAVE_HARDLOCKUP_DETECTOR_ARCH? Here you say "the generic perf one" and
> there you say "the generic ones", though it seems like you mean them
> to be the same.

Good point, I'll fix it in v2.

> > diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> > index 3e91fa33c7a0..d201f5d3876b 100644
> > --- a/lib/Kconfig.debug
> > +++ b/lib/Kconfig.debug
> > @@ -1035,16 +1035,33 @@ config BOOTPARAM_SOFTLOCKUP_PANIC
> >
> >           Say N if unsure.
> >
> > +config HAVE_HARDLOCKUP_DETECTOR_BUDDY
> > +       bool
> > +       depends on SMP
> > +       default y
> 
> I think you simplify your life if you also add:
> 
>   depends on !HAVE_NMI_WATCHDOG
> 
> The existing config system has always assumed that no architecture
> defines both HAVE_HARDLOCKUP_DETECTOR_PERF and HAVE_NMI_WATCHDOG
> (symbols would have clashed and you would get a link error as two
> watchdogs try to implement the same weak symbol). If you add the extra
> dependency to "buddy" as per above, then a few things below fall out.
> I'll try to point them out below.

My aim is to have two variables with and without HAVE_* prefix
for each variant. They already existed for perf:

   + HAVE_HARDLOCKUP_DETECTOR_PERF means that it is supported by the arch.
   + HARDLOCKUP_DETECTOR_PERF means that it will really be built.

1. It will make it clear what variants are available on the give system.
   And it will make it clear what variant is used in the end.

2. It allows to add the dependecy on the global switch HARDLOCKUP_DETECTOR.
   It makes it clear that the detector could be disabled by this switch.
   Also it actually allows to use both top-bottom logic and C-like
   definition ordering.


> 
> > +
> >  #
> > -# arch/ can define HAVE_HARDLOCKUP_DETECTOR_ARCH to provide their own hard
> > -# lockup detector rather than the perf based detector.
> > +# Global switch whether to build a hardlockup detector at all. It is available
> > +# only when the architecture supports at least one implementation. There are
> > +# two exceptions. The hardlockup detector is newer enabled on:
> 
> s/newer/never/

Great catch. Will fix in v2.

> > +#
> > +#      s390: it reported many false positives there
> > +#
> > +#      sparc64: has a custom implementation which is not using the common
> > +#              hardlockup command line options and sysctl interface.
> > +#
> > +# Note that HAVE_NMI_WATCHDOG is used to distinguish the sparc64 specific
> > +# implementaion. It is automatically enabled also for other arch-specific
> > +# variants which set HAVE_HARDLOCKUP_DETECTOR_ARCH. It makes the check
> > +# of avaialable and supported variants quite tricky.
> >  #
> >  config HARDLOCKUP_DETECTOR
> >         bool "Detect Hard Lockups"
> >         depends on DEBUG_KERNEL && !S390
> > -       depends on HAVE_HARDLOCKUP_DETECTOR_NON_ARCH || HAVE_HARDLOCKUP_DETECTOR_ARCH
> > +       depends on ((HAVE_HARDLOCKUP_DETECTOR_PERF || HAVE_HARDLOCKUP_DETECTOR_BUDDY) && !HAVE_NMI_WATCHDOG) || HAVE_HARDLOCKUP_DETECTOR_ARCH
> 
> Adding the dependency to buddy (see ablove) would simplify the above
> to just this:
> 
> depends on HAVE_HARDLOCKUP_DETECTOR_PERF ||
> HAVE_HARDLOCKUP_DETECTOR_BUDDY || HAVE_HARDLOCKUP_DETECTOR_ARCH

This is exactly what I do not want. It would just move the check
somewhere else. But it would make the logic harder to understand.
Especially when the related definitions could not be found by cscope.

> As per above, it's simply a responsibility of architectures not to
> define that they have both "perf" if they have the NMI watchdog, so
> it's just buddy to worry about.

Where is this documented, please?
Is it safe to assume this?

I would personally prefer to ensure this by the config check.
It is even better than documentation because nobody reads
documentation ;-)

It took me long time to understand all the dependencies and
possibilities. My motivation is that neither me nor others
would need to absolve the same adventure in the future.

> 
> > +       imply HARDLOCKUP_DETECTOR_PERF
> > +       imply HARDLOCKUP_DETECTOR_BUDDY
> >         select LOCKUP_DETECTOR
> > -       select HARDLOCKUP_DETECTOR_NON_ARCH if HAVE_HARDLOCKUP_DETECTOR_NON_ARCH
> >
> >         help
> >           Say Y here to enable the kernel to act as a watchdog to detect
> > @@ -1055,9 +1072,15 @@ config HARDLOCKUP_DETECTOR
> >           chance to run.  The current stack trace is displayed upon detection
> >           and the system will stay locked up.
> >
> > +#
> > +# Note that arch-specific variants are always preferred.
> > +#
> >  config HARDLOCKUP_DETECTOR_PREFER_BUDDY
> >         bool "Prefer the buddy CPU hardlockup detector"
> > -       depends on HAVE_HARDLOCKUP_DETECTOR_PERF && SMP
> > +       depends on HARDLOCKUP_DETECTOR
> > +       depends on HAVE_HARDLOCKUP_DETECTOR_PERF && HAVE_HARDLOCKUP_DETECTOR_BUDDY
> > +       depends on !HAVE_NMI_WATCHDOG
> 
> Can get rid of above "!HAVE_NMI_WATCHDOG" if it's added to
> HAVE_HARDLOCKUP_DETECTOR_BUDDY.

I would prefer to keep it hear to make it clear on the first look.

> > +       default n
> 
> I'm pretty sure "default n" isn't needed.

I'll try and fix in v2.

> 
> >         help
> >           Say Y here to prefer the buddy hardlockup detector over the perf one.
> >
> > @@ -1071,39 +1094,27 @@ config HARDLOCKUP_DETECTOR_PREFER_BUDDY
> >
> >  config HARDLOCKUP_DETECTOR_PERF
> >         bool
> > -       depends on HAVE_HARDLOCKUP_DETECTOR_PERF
> > +       depends on HARDLOCKUP_DETECTOR
> > +       depends on HAVE_HARDLOCKUP_DETECTOR_PERF && !HARDLOCKUP_DETECTOR_PREFER_BUDDY
> > +       depends on !HAVE_NMI_WATCHDOG
> 
> We don't really need the "!HAVE_NMI_WATCHDOG". A user wouldn't be able
> to mess this up and it's up to an architecture not to define both
> HAVE_HARDLOCKUP_DETECTOR_PERF and HAVE_NMI_WATCHDOG.

This is an assumption that only few people knows. I would prefer
to enforce this by the check.

> Overall, though, I agree that this improves things! Thanks! :-)

Thanks a lot for the review. I did my best and I got lost many times ;-)

Best Regards,
Petr
Doug Anderson June 8, 2023, 1:55 p.m. UTC | #3
Hi,

On Thu, Jun 8, 2023 at 4:02 AM Petr Mladek <pmladek@suse.com> wrote:
>
> > >  config HARDLOCKUP_DETECTOR
> > >         bool "Detect Hard Lockups"
> > >         depends on DEBUG_KERNEL && !S390
> > > -       depends on HAVE_HARDLOCKUP_DETECTOR_NON_ARCH || HAVE_HARDLOCKUP_DETECTOR_ARCH
> > > +       depends on ((HAVE_HARDLOCKUP_DETECTOR_PERF || HAVE_HARDLOCKUP_DETECTOR_BUDDY) && !HAVE_NMI_WATCHDOG) || HAVE_HARDLOCKUP_DETECTOR_ARCH
> >
> > Adding the dependency to buddy (see ablove) would simplify the above
> > to just this:
> >
> > depends on HAVE_HARDLOCKUP_DETECTOR_PERF ||
> > HAVE_HARDLOCKUP_DETECTOR_BUDDY || HAVE_HARDLOCKUP_DETECTOR_ARCH
>
> This is exactly what I do not want. It would just move the check
> somewhere else. But it would make the logic harder to understand.

Hmmm. To me, it felt easier to understand by moving this into the
"HAVE_HARDLOCKUP_DETECTOR_BUDDY". To me it was pretty easy to say "if
an architecture defined its own arch-specific watchdog then buddy
can't be enabled" and that felt like it fit cleanly within the
"HAVE_HARDLOCKUP_DETECTOR_BUDDY" definition. It got rid of _a lot_ of
other special cases / checks elsewhere and felt quite a bit cleaner to
me. I only had to think about the conflict between the "buddy" and
"nmi" watchdogs once when I understood
"HAVE_HARDLOCKUP_DETECTOR_BUDDY".


> > As per above, it's simply a responsibility of architectures not to
> > define that they have both "perf" if they have the NMI watchdog, so
> > it's just buddy to worry about.
>
> Where is this documented, please?
> Is it safe to assume this?

It's not well documented and I agree that it could be improved. Right
now, HAVE_NMI_WATCHDOG is documented to say that the architecture
"defines its own arch_touch_nmi_watchdog()". Looking before my
patches, you can see that "kernel/watchdog_hld.c" (the "perf" detector
code) unconditionally defines arch_touch_nmi_watchdog(). That would
give you a linker error.


> I would personally prefer to ensure this by the config check.
> It is even better than documentation because nobody reads
> documentation ;-)

Sure. IMO this should be documented as close as possible to the root
of the problem. Make "HAVE_NMI_WATCHDOG" depend on
"!HAVE_HARDLOCKUP_DETECTOR_PERF". That expresses that an architecture
is not allowed to declare that it has both.
Petr Mladek June 14, 2023, 10:29 a.m. UTC | #4
On Thu 2023-06-08 06:55:23, Doug Anderson wrote:
> Hi,
> 
> On Thu, Jun 8, 2023 at 4:02 AM Petr Mladek <pmladek@suse.com> wrote:
> >
> > > >  config HARDLOCKUP_DETECTOR
> > > >         bool "Detect Hard Lockups"
> > > >         depends on DEBUG_KERNEL && !S390
> > > > -       depends on HAVE_HARDLOCKUP_DETECTOR_NON_ARCH || HAVE_HARDLOCKUP_DETECTOR_ARCH
> > > > +       depends on ((HAVE_HARDLOCKUP_DETECTOR_PERF || HAVE_HARDLOCKUP_DETECTOR_BUDDY) && !HAVE_NMI_WATCHDOG) || HAVE_HARDLOCKUP_DETECTOR_ARCH
> > >
> > > Adding the dependency to buddy (see ablove) would simplify the above
> > > to just this:
> > >
> > > depends on HAVE_HARDLOCKUP_DETECTOR_PERF ||
> > > HAVE_HARDLOCKUP_DETECTOR_BUDDY || HAVE_HARDLOCKUP_DETECTOR_ARCH
> >
> > This is exactly what I do not want. It would just move the check
> > somewhere else. But it would make the logic harder to understand.
> 
> Hmmm. To me, it felt easier to understand by moving this into the
> "HAVE_HARDLOCKUP_DETECTOR_BUDDY". To me it was pretty easy to say "if
> an architecture defined its own arch-specific watchdog then buddy
> can't be enabled" and that felt like it fit cleanly within the
> "HAVE_HARDLOCKUP_DETECTOR_BUDDY" definition. It got rid of _a lot_ of
> other special cases / checks elsewhere and felt quite a bit cleaner to
> me. I only had to think about the conflict between the "buddy" and
> "nmi" watchdogs once when I understood
> "HAVE_HARDLOCKUP_DETECTOR_BUDDY".

I see. My problem with this approach was that the dependencies between
the 4 alternative implementations were too distributed. It was
necessary read many definitions to understand what was possible and
what was not possible. And it is even more complicated when
cscope does not support Kconfig.

Also the above solves the buddy detector which is global.

The same conflict has PERF which has arch-specific dependencies.
Maybe, it can be disabled by a conflict in the arch/Kconfig.
But then the PERF dependencies would be split into 3 config
files: arch/Kconfig, lib/Kconfig.debug, arch/Kconfig/.

Anyway, HAVE_*_BUDDY and HAVE_*_PERF should behave the same.
Both should either explicitly conflict with HAVE_*_ARCH
and HAVE_NMI_WATCHDOG. Or they both should be enabled when
they are supported by the architecture and just ignored when
choosing the final implementation.

My wish was to have consistent naming:

   + HAVE_HARDLOCKUP_DETECTOR_<impl> set when the the architecture
       supports the particular implementation.

  + HARDLOCKUP_DETECTOR_<impl> set when the implementation will
       be used (built).


Step aside:

It seems that we have entered into a bike shedding mode.
The following questions come to my mind:

   1. Does this patchset improve the current state?

   2. Maybe, it is not black&white. Is it possible to summarize
      what exactly got better and what got worse?

Maybe, there is no need to do bike-shedding about every step
if the final result is reasonable and the steps are not
completely wrong.

I just followed my intuition and tried to do some changes step
by step. I got lost many times so maybe the steps are not
ideal. Anyway, the steps helped me to understand the logic
and stay reasonably confident that they did not change
the behavior.

I could rework the patchset. But I first need to know what
exactly is bad in the result. And eventually if there is more
logical way how to end there.

Best Regards,
Petr
Doug Anderson June 14, 2023, 1:47 p.m. UTC | #5
Hi,

On Wed, Jun 14, 2023 at 3:29 AM Petr Mladek <pmladek@suse.com> wrote:
>
> It seems that we have entered into a bike shedding mode.
> The following questions come to my mind:
>
>    1. Does this patchset improve the current state?
>
>    2. Maybe, it is not black&white. Is it possible to summarize
>       what exactly got better and what got worse?
>
> Maybe, there is no need to do bike-shedding about every step
> if the final result is reasonable and the steps are not
> completely wrong.
>
> I just followed my intuition and tried to do some changes step
> by step. I got lost many times so maybe the steps are not
> ideal. Anyway, the steps helped me to understand the logic
> and stay reasonably confident that they did not change
> the behavior.
>
> I could rework the patchset. But I first need to know what
> exactly is bad in the result. And eventually if there is more
> logical way how to end there.

Sure. I still feel like the end result of the CONFIG options after
your whole patchset is easier to understand / cleaner by adjusting the
dependencies as I have suggested. That being said, I agree that it is
the type of thing that can be more a matter of personal preference. I
do agree that, even if you don't take my suggestion of adjusting the
dependencies, the end result of your patchset still makes things
better than they were.

...so if you really feel strongly that things are more understandable
with the dependencies specified as you have, I won't stand in the way.
I still think you need a v2, though, just to address other nits.

-Doug
diff mbox series

Patch

diff --git a/arch/Kconfig b/arch/Kconfig
index 422f0ffa269e..13c6e596cf9e 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -404,17 +404,27 @@  config HAVE_NMI_WATCHDOG
 	depends on HAVE_NMI
 	bool
 	help
-	  The arch provides a low level NMI watchdog. It provides
-	  asm/nmi.h, and defines its own watchdog_hardlockup_probe() and
-	  arch_touch_nmi_watchdog().
+	  The arch provides its own hardlockup detector implementation instead
+	  of the generic perf one.
+
+	  Sparc64 defines this variable without HAVE_HARDLOCKUP_DETECTOR_ARCH.
+	  It does _not_ use the command line parameters and sysctl interface
+	  used by generic hardlockup detectors. Instead it is enabled/disabled
+	  by the top-level watchdog interface that is common for both softlockup
+	  and hardlockup detectors.
 
 config HAVE_HARDLOCKUP_DETECTOR_ARCH
 	bool
 	select HAVE_NMI_WATCHDOG
 	help
-	  The arch chooses to provide its own hardlockup detector, which is
-	  a superset of the HAVE_NMI_WATCHDOG. It also conforms to config
-	  interfaces and parameters provided by hardlockup detector subsystem.
+	  The arch provides its own hardlockup detector implementation instead
+	  of the generic ones.
+
+	  It uses the same command line parameters, and sysctl interface,
+	  as the generic hardlockup detectors.
+
+	  HAVE_NMI_WATCHDOG is selected to build the code shared with
+	  the sparc64 specific implementation.
 
 config HAVE_PERF_REGS
 	bool
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 3e91fa33c7a0..d201f5d3876b 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1035,16 +1035,33 @@  config BOOTPARAM_SOFTLOCKUP_PANIC
 
 	  Say N if unsure.
 
+config HAVE_HARDLOCKUP_DETECTOR_BUDDY
+	bool
+	depends on SMP
+	default y
+
 #
-# arch/ can define HAVE_HARDLOCKUP_DETECTOR_ARCH to provide their own hard
-# lockup detector rather than the perf based detector.
+# Global switch whether to build a hardlockup detector at all. It is available
+# only when the architecture supports at least one implementation. There are
+# two exceptions. The hardlockup detector is newer enabled on:
+#
+#	s390: it reported many false positives there
+#
+#	sparc64: has a custom implementation which is not using the common
+#		hardlockup command line options and sysctl interface.
+#
+# Note that HAVE_NMI_WATCHDOG is used to distinguish the sparc64 specific
+# implementaion. It is automatically enabled also for other arch-specific
+# variants which set HAVE_HARDLOCKUP_DETECTOR_ARCH. It makes the check
+# of avaialable and supported variants quite tricky.
 #
 config HARDLOCKUP_DETECTOR
 	bool "Detect Hard Lockups"
 	depends on DEBUG_KERNEL && !S390
-	depends on HAVE_HARDLOCKUP_DETECTOR_NON_ARCH || HAVE_HARDLOCKUP_DETECTOR_ARCH
+	depends on ((HAVE_HARDLOCKUP_DETECTOR_PERF || HAVE_HARDLOCKUP_DETECTOR_BUDDY) && !HAVE_NMI_WATCHDOG) || HAVE_HARDLOCKUP_DETECTOR_ARCH
+	imply HARDLOCKUP_DETECTOR_PERF
+	imply HARDLOCKUP_DETECTOR_BUDDY
 	select LOCKUP_DETECTOR
-	select HARDLOCKUP_DETECTOR_NON_ARCH if HAVE_HARDLOCKUP_DETECTOR_NON_ARCH
 
 	help
 	  Say Y here to enable the kernel to act as a watchdog to detect
@@ -1055,9 +1072,15 @@  config HARDLOCKUP_DETECTOR
 	  chance to run.  The current stack trace is displayed upon detection
 	  and the system will stay locked up.
 
+#
+# Note that arch-specific variants are always preferred.
+#
 config HARDLOCKUP_DETECTOR_PREFER_BUDDY
 	bool "Prefer the buddy CPU hardlockup detector"
-	depends on HAVE_HARDLOCKUP_DETECTOR_PERF && SMP
+	depends on HARDLOCKUP_DETECTOR
+	depends on HAVE_HARDLOCKUP_DETECTOR_PERF && HAVE_HARDLOCKUP_DETECTOR_BUDDY
+	depends on !HAVE_NMI_WATCHDOG
+	default n
 	help
 	  Say Y here to prefer the buddy hardlockup detector over the perf one.
 
@@ -1071,39 +1094,27 @@  config HARDLOCKUP_DETECTOR_PREFER_BUDDY
 
 config HARDLOCKUP_DETECTOR_PERF
 	bool
-	depends on HAVE_HARDLOCKUP_DETECTOR_PERF
+	depends on HARDLOCKUP_DETECTOR
+	depends on HAVE_HARDLOCKUP_DETECTOR_PERF && !HARDLOCKUP_DETECTOR_PREFER_BUDDY
+	depends on !HAVE_NMI_WATCHDOG
 	select HARDLOCKUP_DETECTOR_COUNTS_HRTIMER
 
 config HARDLOCKUP_DETECTOR_BUDDY
 	bool
-	depends on SMP
+	depends on HARDLOCKUP_DETECTOR
+	depends on HAVE_HARDLOCKUP_DETECTOR_BUDDY
+	depends on !HAVE_HARDLOCKUP_DETECTOR_PERF || HARDLOCKUP_DETECTOR_PREFER_BUDDY
+	depends on !HAVE_NMI_WATCHDOG
 	select HARDLOCKUP_DETECTOR_COUNTS_HRTIMER
 
+#
 # Both the "perf" and "buddy" hardlockup detectors count hrtimer
 # interrupts. This config enables functions managing this common code.
+#
 config HARDLOCKUP_DETECTOR_COUNTS_HRTIMER
 	bool
 	select SOFTLOCKUP_DETECTOR
 
-# For hardlockup detectors you can have one directly provided by the arch
-# or use a "non-arch" one. If you're using a "non-arch" one that is
-# further divided the perf hardlockup detector (which, confusingly, needs
-# arch-provided perf support) and the buddy hardlockup detector (which just
-# needs SMP). In either case, using the "non-arch" code conflicts with
-# the NMI watchdog code (which is sometimes used directly and sometimes used
-# by the arch-provided hardlockup detector).
-config HAVE_HARDLOCKUP_DETECTOR_NON_ARCH
-	bool
-	depends on (HAVE_HARDLOCKUP_DETECTOR_PERF || SMP) && !HAVE_NMI_WATCHDOG
-	default y
-
-# This will select the appropriate non-arch hardlockdup detector
-config HARDLOCKUP_DETECTOR_NON_ARCH
-	bool
-	depends on HAVE_HARDLOCKUP_DETECTOR_NON_ARCH
-	select HARDLOCKUP_DETECTOR_BUDDY if !HAVE_HARDLOCKUP_DETECTOR_PERF || HARDLOCKUP_DETECTOR_PREFER_BUDDY
-	select HARDLOCKUP_DETECTOR_PERF if HAVE_HARDLOCKUP_DETECTOR_PERF && !HARDLOCKUP_DETECTOR_PREFER_BUDDY
-
 #
 # Enables a timestamp based low pass filter to compensate for perf based
 # hard lockup detection which runs too fast due to turbo modes.