diff mbox series

phb4/capp: Only reset FIR bits that cause capp machine check

Message ID 20181101053515.4344-1-vaibhav@linux.ibm.com
State Accepted
Headers show
Series phb4/capp: Only reset FIR bits that cause capp machine check | expand

Checks

Context Check Description
snowpatch_ozlabs/apply_patch success master/apply_patch Successfully applied
snowpatch_ozlabs/make_check success Test make_check on branch master

Commit Message

Vaibhav Jain Nov. 1, 2018, 5:35 a.m. UTC
During CAPP recovery do_capp_recovery_scoms() will reset the CAPP Fir
register just after CAPP recovery is completed. This has an
unintentional side effect of preventing PRD from analyzing and
reporting this error. If PRD tries to read the CAPP FIR after opal has
already reset it, then it logs a critical error complaining "No active
error bits found".

To prevent this from happening we update do_capp_recovery_scoms() to
only reset fir bits that cause CAPP machine check (local xstop). This
is done by reading the CAPP Fir Action0/1 & Mask registers and
generating a mask which is then written on CAPP_FIR_CLEAR register.

Cc: stable
Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com>
---
 hw/phb4.c           | 17 +++++++++++++++++
 include/phb4-capp.h |  1 +
 2 files changed, 18 insertions(+)

Comments

Stewart Smith Nov. 2, 2018, 8:01 a.m. UTC | #1
Vaibhav Jain <vaibhav@linux.ibm.com> writes:
> During CAPP recovery do_capp_recovery_scoms() will reset the CAPP Fir
> register just after CAPP recovery is completed. This has an
> unintentional side effect of preventing PRD from analyzing and
> reporting this error. If PRD tries to read the CAPP FIR after opal has
> already reset it, then it logs a critical error complaining "No active
> error bits found".
>
> To prevent this from happening we update do_capp_recovery_scoms() to
> only reset fir bits that cause CAPP machine check (local xstop). This
> is done by reading the CAPP Fir Action0/1 & Mask registers and
> generating a mask which is then written on CAPP_FIR_CLEAR register.
>
> Cc: stable
> Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com>

Cheers, Merged to master as of 999246716d2da347aad46a28ed9899b832bffe6c
and into 6.0.x as of bf93742f5c047082a759dda6799e42808e2f9135 for 6.0.11
diff mbox series

Patch

diff --git a/hw/phb4.c b/hw/phb4.c
index 10df206b..7a1f58e3 100644
--- a/hw/phb4.c
+++ b/hw/phb4.c
@@ -3055,7 +3055,24 @@  static int do_capp_recovery_scoms(struct phb4 *p)
 
 	/* Check if the recovery failed or passed */
 	if (reg & PPC_BIT(1)) {
+		uint64_t act0, act1, mask, fir;
+
+		/* Use the Action0/1 and mask to only clear the bits
+		 * that cause local checkstop. Other bits needs attention
+		 * of the PRD daemon.
+		 */
+		xscom_read(p->chip_id, CAPP_FIR_ACTION0 + offset, &act0);
+		xscom_read(p->chip_id, CAPP_FIR_ACTION1 + offset, &act1);
+		xscom_read(p->chip_id, CAPP_FIR_MASK + offset, &mask);
+		xscom_read(p->chip_id, CAPP_FIR + offset, &fir);
+
+		fir = ~(fir & ~mask & act0 & act1);
 		PHBDBG(p, "Doing CAPP recovery scoms\n");
+
+		/* update capp fir clearing bits causing local checkstop */
+		PHBDBG(p, "Resetting CAPP Fir with mask 0x%016llX\n", fir);
+		xscom_write(p->chip_id, CAPP_FIR_CLEAR + offset, fir);
+
 		/* disable snoops */
 		xscom_write(p->chip_id, SNOOP_CAPI_CONFIG + offset, 0);
 		load_capp_ucode(p);
diff --git a/include/phb4-capp.h b/include/phb4-capp.h
index 68200ac5..2f309d4c 100644
--- a/include/phb4-capp.h
+++ b/include/phb4-capp.h
@@ -23,6 +23,7 @@ 
 #define CAPP_APC_MASTER_ARRAY_WRITE_REG		0x2010842  /* Satellite 2 */
 
 #define CAPP_FIR				0x2010800
+#define CAPP_FIR_CLEAR				0x2010801
 #define CAPP_FIR_MASK				0x2010803
 #define CAPP_FIR_ACTION0			0x2010806
 #define CAPP_FIR_ACTION1			0x2010807