[v2] opal/hmi: Wakeup the cpu before reading core_fir

Message ID 20180827101354.678-1-vaibhav@linux.ibm.com
State Accepted
Headers show
Series
  • [v2] opal/hmi: Wakeup the cpu before reading core_fir
Related show

Checks

Context Check Description
snowpatch_ozlabs/make_check success Test make_check on branch master
snowpatch_ozlabs/apply_patch success master/apply_patch Successfully applied

Commit Message

Vaibhav Jain Aug. 27, 2018, 10:13 a.m.
When stop state 5 is enabled, reading the core_fir during an HMI can
result in a xscom read error with xscom_read() returning an
OPAL_XSCOM_PARTIAL_GOOD error code and core_fir value of all FFs. At
present this return error code is not handled in decode_core_fir()
hence the invalid core_fir value is sent to the kernel where it
interprets it as a FATAL hmi causing a system check-stop.

This can be prevented by forcing the core to wake-up using before
reading the core_fir. Hence this patch wraps the call to
read_core_fir() within calls to dctl_set_special_wakeup() and
dctl_clear_special_wakeup().

Suggested-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com>
Signed-off-by: Mahesh J Salgaonkar <mahesh@linux.vnet.ibm.com>

---
Change-log:

v2	-> Simplified the flow to just wrap read_core_fir() in
	dctl_set/clear_special_wakeup instead. [Mahesh]
---
 core/hmi.c | 21 +++++++++++++++------
 1 file changed, 15 insertions(+), 6 deletions(-)

Comments

Gautham R Shenoy Sept. 6, 2018, 3:16 p.m. | #1
On Mon, Aug 27, 2018 at 03:43:54PM +0530, Vaibhav Jain wrote:
> When stop state 5 is enabled, reading the core_fir during an HMI can
> result in a xscom read error with xscom_read() returning an
> OPAL_XSCOM_PARTIAL_GOOD error code and core_fir value of all FFs. At
> present this return error code is not handled in decode_core_fir()
> hence the invalid core_fir value is sent to the kernel where it
> interprets it as a FATAL hmi causing a system check-stop.
> 
> This can be prevented by forcing the core to wake-up using before
> reading the core_fir. Hence this patch wraps the call to
> read_core_fir() within calls to dctl_set_special_wakeup() and
> dctl_clear_special_wakeup().
> 
> Suggested-by: Michael Neuling <mikey@neuling.org>
> Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com>
> Signed-off-by: Mahesh J Salgaonkar <mahesh@linux.vnet.ibm.com>

Looks ok to me.

Acked-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>

> 
> ---
> Change-log:
> 
> v2	-> Simplified the flow to just wrap read_core_fir() in
> 	dctl_set/clear_special_wakeup instead. [Mahesh]
> ---
>  core/hmi.c | 21 +++++++++++++++------
>  1 file changed, 15 insertions(+), 6 deletions(-)
> 
> diff --git a/core/hmi.c b/core/hmi.c
> index 1f665a2f..d9240cee 100644
> --- a/core/hmi.c
> +++ b/core/hmi.c
> @@ -379,7 +379,7 @@ static bool decode_core_fir(struct cpu_thread *cpu,
>  {
>  	uint64_t core_fir;
>  	uint32_t core_id;
> -	int i;
> +	int i, swkup_rc;
>  	bool found = false;
>  	int64_t ret;
>  	const char *loc;
> @@ -390,14 +390,19 @@ static bool decode_core_fir(struct cpu_thread *cpu,
> 
>  	core_id = pir_to_core_id(cpu->pir);
> 
> +	/* Force the core to wakeup, otherwise reading core_fir is unrealiable
> +	 * if stop-state 5 is enabled.
> +	 */
> +	swkup_rc = dctl_set_special_wakeup(cpu);
> +
>  	/* Get CORE FIR register value. */
>  	ret = read_core_fir(cpu->chip_id, core_id, &core_fir);
> 
> -	if (ret == OPAL_HARDWARE) {
> -		prerror("XSCOM error reading CORE FIR\n");
> -		/* If the FIR can't be read, we should checkstop. */
> -		return true;
> -	} else if (ret == OPAL_WRONG_STATE) {
> +	if (!swkup_rc)
> +		dctl_clear_special_wakeup(cpu);
> +
> +
> +	if (ret == OPAL_WRONG_STATE) {
>  		/*
>  		 * CPU is asleep, so it probably didn't cause the checkstop.
>  		 * If no other HMI cause is found a "catchall" checkstop
> @@ -408,6 +413,10 @@ static bool decode_core_fir(struct cpu_thread *cpu,
>  		      "FIR read failed, chip %d core %d asleep\n",
>  		      cpu->chip_id, core_id);
>  		return false;
> +	} else if (ret != OPAL_SUCCESS) {
> +		prerror("XSCOM error reading CORE FIR\n");
> +		/* If the FIR can't be read, we should checkstop. */
> +		return true;
>  	}
> 
>  	if (!core_fir)
> -- 
> 2.17.1
>
Shilpasri G Bhat Sept. 7, 2018, 10:34 a.m. | #2
On 08/27/2018 03:43 PM, Vaibhav Jain wrote:
> When stop state 5 is enabled, reading the core_fir during an HMI can
> result in a xscom read error with xscom_read() returning an
> OPAL_XSCOM_PARTIAL_GOOD error code and core_fir value of all FFs. At
> present this return error code is not handled in decode_core_fir()
> hence the invalid core_fir value is sent to the kernel where it
> interprets it as a FATAL hmi causing a system check-stop.
> 
> This can be prevented by forcing the core to wake-up using before
> reading the core_fir. Hence this patch wraps the call to
> read_core_fir() within calls to dctl_set_special_wakeup() and
> dctl_clear_special_wakeup().
> 
> Suggested-by: Michael Neuling <mikey@neuling.org>
> Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com>
> Signed-off-by: Mahesh J Salgaonkar <mahesh@linux.vnet.ibm.com>
> 

Reviewed-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>

> ---
> Change-log:
> 
> v2	-> Simplified the flow to just wrap read_core_fir() in
> 	dctl_set/clear_special_wakeup instead. [Mahesh]
> ---
>  core/hmi.c | 21 +++++++++++++++------
>  1 file changed, 15 insertions(+), 6 deletions(-)
> 
> diff --git a/core/hmi.c b/core/hmi.c
> index 1f665a2f..d9240cee 100644
> --- a/core/hmi.c
> +++ b/core/hmi.c
> @@ -379,7 +379,7 @@ static bool decode_core_fir(struct cpu_thread *cpu,
>  {
>  	uint64_t core_fir;
>  	uint32_t core_id;
> -	int i;
> +	int i, swkup_rc;
>  	bool found = false;
>  	int64_t ret;
>  	const char *loc;
> @@ -390,14 +390,19 @@ static bool decode_core_fir(struct cpu_thread *cpu,
> 
>  	core_id = pir_to_core_id(cpu->pir);
> 
> +	/* Force the core to wakeup, otherwise reading core_fir is unrealiable
> +	 * if stop-state 5 is enabled.
> +	 */
> +	swkup_rc = dctl_set_special_wakeup(cpu);
> +
>  	/* Get CORE FIR register value. */
>  	ret = read_core_fir(cpu->chip_id, core_id, &core_fir);
> 
> -	if (ret == OPAL_HARDWARE) {
> -		prerror("XSCOM error reading CORE FIR\n");
> -		/* If the FIR can't be read, we should checkstop. */
> -		return true;
> -	} else if (ret == OPAL_WRONG_STATE) {
> +	if (!swkup_rc)
> +		dctl_clear_special_wakeup(cpu);
> +
> +
> +	if (ret == OPAL_WRONG_STATE) {
>  		/*
>  		 * CPU is asleep, so it probably didn't cause the checkstop.
>  		 * If no other HMI cause is found a "catchall" checkstop
> @@ -408,6 +413,10 @@ static bool decode_core_fir(struct cpu_thread *cpu,
>  		      "FIR read failed, chip %d core %d asleep\n",
>  		      cpu->chip_id, core_id);
>  		return false;
> +	} else if (ret != OPAL_SUCCESS) {
> +		prerror("XSCOM error reading CORE FIR\n");
> +		/* If the FIR can't be read, we should checkstop. */
> +		return true;
>  	}
> 
>  	if (!core_fir)
>
Stewart Smith Oct. 16, 2018, 7:52 a.m. | #3
Vaibhav Jain <vaibhav@linux.ibm.com> writes:
> When stop state 5 is enabled, reading the core_fir during an HMI can
> result in a xscom read error with xscom_read() returning an
> OPAL_XSCOM_PARTIAL_GOOD error code and core_fir value of all FFs. At
> present this return error code is not handled in decode_core_fir()
> hence the invalid core_fir value is sent to the kernel where it
> interprets it as a FATAL hmi causing a system check-stop.
>
> This can be prevented by forcing the core to wake-up using before
> reading the core_fir. Hence this patch wraps the call to
> read_core_fir() within calls to dctl_set_special_wakeup() and
> dctl_clear_special_wakeup().
>
> Suggested-by: Michael Neuling <mikey@neuling.org>
> Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com>
> Signed-off-by: Mahesh J Salgaonkar <mahesh@linux.vnet.ibm.com>

Thanks! Merged to master as of 7cef472ed1fea75cc63cb0d8ac86d6a13d08517d

Patch

diff --git a/core/hmi.c b/core/hmi.c
index 1f665a2f..d9240cee 100644
--- a/core/hmi.c
+++ b/core/hmi.c
@@ -379,7 +379,7 @@  static bool decode_core_fir(struct cpu_thread *cpu,
 {
 	uint64_t core_fir;
 	uint32_t core_id;
-	int i;
+	int i, swkup_rc;
 	bool found = false;
 	int64_t ret;
 	const char *loc;
@@ -390,14 +390,19 @@  static bool decode_core_fir(struct cpu_thread *cpu,
 
 	core_id = pir_to_core_id(cpu->pir);
 
+	/* Force the core to wakeup, otherwise reading core_fir is unrealiable
+	 * if stop-state 5 is enabled.
+	 */
+	swkup_rc = dctl_set_special_wakeup(cpu);
+
 	/* Get CORE FIR register value. */
 	ret = read_core_fir(cpu->chip_id, core_id, &core_fir);
 
-	if (ret == OPAL_HARDWARE) {
-		prerror("XSCOM error reading CORE FIR\n");
-		/* If the FIR can't be read, we should checkstop. */
-		return true;
-	} else if (ret == OPAL_WRONG_STATE) {
+	if (!swkup_rc)
+		dctl_clear_special_wakeup(cpu);
+
+
+	if (ret == OPAL_WRONG_STATE) {
 		/*
 		 * CPU is asleep, so it probably didn't cause the checkstop.
 		 * If no other HMI cause is found a "catchall" checkstop
@@ -408,6 +413,10 @@  static bool decode_core_fir(struct cpu_thread *cpu,
 		      "FIR read failed, chip %d core %d asleep\n",
 		      cpu->chip_id, core_id);
 		return false;
+	} else if (ret != OPAL_SUCCESS) {
+		prerror("XSCOM error reading CORE FIR\n");
+		/* If the FIR can't be read, we should checkstop. */
+		return true;
 	}
 
 	if (!core_fir)