[6/7] powerpc/eeh: Allow disabling recovery

Message ID 20190208030802.10805-6-oohall@gmail.com
State Changes Requested
Headers show
Series
  • [1/7] powerpc/eeh: Use debugfs_create_u32 for eeh_max_freezes
Related show

Checks

Context Check Description
snowpatch_ozlabs/checkpatch warning total: 0 errors, 0 warnings, 1 checks, 46 lines checked
snowpatch_ozlabs/apply_patch success next/apply_patch Successfully applied

Commit Message

Oliver O'Halloran Feb. 8, 2019, 3:08 a.m.
Currently when we detect an error we automatically invoke the EEH recovery
handler. This can be annoying when debugging EEH problems, or when working
on EEH itself so this patch adds a debugfs knob that will prevent a
recovery event from being queued up when an issue is detected.

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
---
 arch/powerpc/include/asm/eeh.h  |  1 +
 arch/powerpc/kernel/eeh.c       | 11 +++++++++++
 arch/powerpc/kernel/eeh_event.c |  9 +++++++++
 3 files changed, 21 insertions(+)

Comments

Michael Ellerman Feb. 8, 2019, 9:58 a.m. | #1
Oliver O'Halloran <oohall@gmail.com> writes:

> diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
> index d1f0bdf41fac..92809b137e39 100644
> --- a/arch/powerpc/kernel/eeh.c
> +++ b/arch/powerpc/kernel/eeh.c
> @@ -1810,7 +1817,11 @@ static int __init eeh_init_proc(void)
>  					   &eeh_enable_dbgfs_ops);
>  		debugfs_create_u32("eeh_max_freezes", 0600,
>  				powerpc_debugfs_root, &eeh_max_freezes);
> +		debugfs_create_bool("eeh_disable_recovery", 0600,
> +				powerpc_debugfs_root,
> +				&eeh_debugfs_no_recover);
>  		eeh_cache_debugfs_init();
> +#endif

There's that endif.

Whem I'm doing rebasing and think I might have broken bisectability I
build every commit with:

  https://github.com/mpe/misc-scripts/blob/master/git/for-each-commit


cheers
Oliver O'Halloran Feb. 8, 2019, 12:52 p.m. | #2
On Fri, Feb 8, 2019 at 8:58 PM Michael Ellerman <mpe@ellerman.id.au> wrote:
>
> Oliver O'Halloran <oohall@gmail.com> writes:
>
> > diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
> > index d1f0bdf41fac..92809b137e39 100644
> > --- a/arch/powerpc/kernel/eeh.c
> > +++ b/arch/powerpc/kernel/eeh.c
> > @@ -1810,7 +1817,11 @@ static int __init eeh_init_proc(void)
> >                                          &eeh_enable_dbgfs_ops);
> >               debugfs_create_u32("eeh_max_freezes", 0600,
> >                               powerpc_debugfs_root, &eeh_max_freezes);
> > +             debugfs_create_bool("eeh_disable_recovery", 0600,
> > +                             powerpc_debugfs_root,
> > +                             &eeh_debugfs_no_recover);
> >               eeh_cache_debugfs_init();
> > +#endif
>
> There's that endif.

Bleh

>
> Whem I'm doing rebasing and think I might have broken bisectability I
> build every commit with:
>
>   https://github.com/mpe/misc-scripts/blob/master/git/for-each-commit

Thanks, I have something similar for skiboot but never got around to
porting it to the kernel.

Patch

diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
index fc21b6e78e91..6f6721561302 100644
--- a/arch/powerpc/include/asm/eeh.h
+++ b/arch/powerpc/include/asm/eeh.h
@@ -220,6 +220,7 @@  struct eeh_ops {
 
 extern int eeh_subsystem_flags;
 extern uint32_t eeh_max_freezes;
+extern bool eeh_debugfs_no_recover;
 extern struct eeh_ops *eeh_ops;
 extern raw_spinlock_t confirm_error_lock;
 
diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
index d1f0bdf41fac..92809b137e39 100644
--- a/arch/powerpc/kernel/eeh.c
+++ b/arch/powerpc/kernel/eeh.c
@@ -111,6 +111,13 @@  EXPORT_SYMBOL(eeh_subsystem_flags);
  */
 uint32_t eeh_max_freezes = 5;
 
+/*
+ * Controls whether a recovery event should be scheduled when an
+ * isolated device is discovered. This is only really useful for
+ * debugging problems with the EEH core.
+ */
+bool eeh_debugfs_no_recover;
+
 /* Platform dependent EEH operations */
 struct eeh_ops *eeh_ops = NULL;
 
@@ -1810,7 +1817,11 @@  static int __init eeh_init_proc(void)
 					   &eeh_enable_dbgfs_ops);
 		debugfs_create_u32("eeh_max_freezes", 0600,
 				powerpc_debugfs_root, &eeh_max_freezes);
+		debugfs_create_bool("eeh_disable_recovery", 0600,
+				powerpc_debugfs_root,
+				&eeh_debugfs_no_recover);
 		eeh_cache_debugfs_init();
+#endif
 	}
 
 	return 0;
diff --git a/arch/powerpc/kernel/eeh_event.c b/arch/powerpc/kernel/eeh_event.c
index 227e57f980df..19837798bb1d 100644
--- a/arch/powerpc/kernel/eeh_event.c
+++ b/arch/powerpc/kernel/eeh_event.c
@@ -126,6 +126,15 @@  int eeh_send_failure_event(struct eeh_pe *pe)
 	unsigned long flags;
 	struct eeh_event *event;
 
+	/*
+	 * If we've manually supressed recovery events via debugfs
+	 * then just drop it on the floor.
+	 */
+	if (eeh_debugfs_no_recover) {
+		pr_err("EEH: Event dropped due to no_recover setting\n");
+		return 0;
+	}
+
 	event = kzalloc(sizeof(*event), GFP_ATOMIC);
 	if (!event) {
 		pr_err("EEH: out of memory, event not handled\n");