[v2,11/15] opal/hmi: Fix handling of TFMR parity/corrupt error.

Message ID 152390004962.2566.14004738930530913719.stgit@jupiter.in.ibm.com
State Accepted
Headers show
  • opal/hmi: Rework HMI handling.
Related show

Commit Message

Mahesh J Salgaonkar April 16, 2018, 5:34 p.m.
From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>

While testing TFMR parity/corrupt error it has been observed that HMIs are
delivered twice for this error
- First time HMI is delivered with HMER[4,5]=1 and TFMR[60]=1.
- Second time HMI is delivered with HMER[4,5]=1 and TFMR[60]=0 with valid TB.

On second HMI we end up throwing below error message even though TB is in
valid state.

	"HMI: TB invalid without core error reported"

This patch fixes this issue by ignoring HMER[5] and checking only for
TFMR[60] before setting this_cpu()->tb_invalid to true.

Suggested-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
 core/hmi.c |    9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)


diff --git a/core/hmi.c b/core/hmi.c
index d9dd83c62..b01a2bf32 100644
--- a/core/hmi.c
+++ b/core/hmi.c
@@ -1045,14 +1045,13 @@  error_out:
 	return recover;
-static int handle_tfac_errors(uint64_t hmer, struct OpalHMIEvent *hmi_evt,
-			      uint64_t *out_flags)
+static int handle_tfac_errors(struct OpalHMIEvent *hmi_evt, uint64_t *out_flags)
 	int recover = -1;
 	uint64_t tfmr = mfspr(SPR_TFMR);
-	/* A TFMR parity error makes us ignore all the local stuff */
+	/* A TFMR parity/corrupt error makes us ignore all the local stuff.*/
+	if (tfmr & SPR_TFMR_TFMR_CORRUPT) {
 		/* Mark TB as invalid for now as we don't trust TFMR, we'll fix
 		 * it up later
@@ -1160,7 +1159,7 @@  static int handle_hmi_exception(uint64_t hmer, struct OpalHMIEvent *hmi_evt,
 		hmi_print_debug("Timer Facility Error", hmer);
 		mtspr(SPR_HMER, ~handled);
-		recover = handle_tfac_errors(hmer, hmi_evt, out_flags);
+		recover = handle_tfac_errors(hmi_evt, out_flags);
 		handled = 0;