diff mbox series

[v3] powerpc/pseries: Handle UE event for memcpy_mcsafe

Message ID 20200322160525.7624-1-ganeshgr@linux.ibm.com (mailing list archive)
State Changes Requested
Headers show
Series [v3] powerpc/pseries: Handle UE event for memcpy_mcsafe | expand

Checks

Context Check Description
snowpatch_ozlabs/apply_patch success Successfully applied on branch powerpc/merge (a87b93bdf800a4d7a42d95683624a4516e516b4f)
snowpatch_ozlabs/build-ppc64le success Build succeeded
snowpatch_ozlabs/build-ppc64be success Build succeeded
snowpatch_ozlabs/build-ppc64e success Build succeeded
snowpatch_ozlabs/build-pmac32 success Build succeeded
snowpatch_ozlabs/checkpatch success total: 0 errors, 0 warnings, 0 checks, 26 lines checked
snowpatch_ozlabs/needsstable success Patch has no Fixes tags

Commit Message

Ganesh Goudar March 22, 2020, 4:05 p.m. UTC
If we hit UE at an instruction with a fixup entry, flag to
ignore the event and set nip to continue execution at the
fixup entry.
For powernv these changes are already made by
commit 895e3dceeb97 ("powerpc/mce: Handle UE event for memcpy_mcsafe")

Reviewed-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Reviewed-by: Santosh S <santosh@fossix.org>
Signed-off-by: Ganesh Goudar <ganeshgr@linux.ibm.com>
---
V2: Fixes a trivial checkpatch error in commit msg.
V3: Use proper subject prefix.
---
 arch/powerpc/platforms/pseries/ras.c | 8 ++++++++
 1 file changed, 8 insertions(+)

Comments

Michael Ellerman March 24, 2020, 5:27 a.m. UTC | #1
Ganesh Goudar <ganeshgr@linux.ibm.com> writes:
> If we hit UE at an instruction with a fixup entry, flag to
> ignore the event and set nip to continue execution at the
> fixup entry.

You don't explain why we would want to do that. Or what the consequences
are if we *don't* do it.

As such it's unclear if this is an important fix or just a nice-to-have.

> For powernv these changes are already made by
> commit 895e3dceeb97 ("powerpc/mce: Handle UE event for memcpy_mcsafe")

We have masses of code that supposedly abstracts the MCE logic. How did
we end up in the situation where we're having to write the same fix
twice for different platforms?

cheers

> Reviewed-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
> Reviewed-by: Santosh S <santosh@fossix.org>
> Signed-off-by: Ganesh Goudar <ganeshgr@linux.ibm.com>
> ---
> V2: Fixes a trivial checkpatch error in commit msg.
> V3: Use proper subject prefix.
> ---
>  arch/powerpc/platforms/pseries/ras.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
>
> diff --git a/arch/powerpc/platforms/pseries/ras.c b/arch/powerpc/platforms/pseries/ras.c
> index 43710b69e09e..58e2483fbb1a 100644
> --- a/arch/powerpc/platforms/pseries/ras.c
> +++ b/arch/powerpc/platforms/pseries/ras.c
> @@ -10,6 +10,7 @@
>  #include <linux/fs.h>
>  #include <linux/reboot.h>
>  #include <linux/irq_work.h>
> +#include <linux/extable.h>
>  
>  #include <asm/machdep.h>
>  #include <asm/rtas.h>
> @@ -505,6 +506,7 @@ static int mce_handle_error(struct pt_regs *regs, struct rtas_error_log *errp)
>  	int initiator = rtas_error_initiator(errp);
>  	int severity = rtas_error_severity(errp);
>  	u8 error_type, err_sub_type;
> +	const struct exception_table_entry *entry;
>  
>  	if (initiator == RTAS_INITIATOR_UNKNOWN)
>  		mce_err.initiator = MCE_INITIATOR_UNKNOWN;
> @@ -558,6 +560,12 @@ static int mce_handle_error(struct pt_regs *regs, struct rtas_error_log *errp)
>  	switch (mce_log->error_type) {
>  	case MC_ERROR_TYPE_UE:
>  		mce_err.error_type = MCE_ERROR_TYPE_UE;
> +		entry = search_kernel_exception_table(regs->nip);
> +		if (entry) {
> +			mce_err.ignore_event = true;
> +			regs->nip = extable_fixup(entry);
> +			disposition = RTAS_DISP_FULLY_RECOVERED;
> +		}
>  		switch (err_sub_type) {
>  		case MC_ERROR_UE_IFETCH:
>  			mce_err.u.ue_error_type = MCE_UE_ERROR_IFETCH;
> -- 
> 2.17.2
Ganesh Goudar March 24, 2020, 11:31 a.m. UTC | #2
On 3/24/20 10:57 AM, Michael Ellerman wrote:
> Ganesh Goudar <ganeshgr@linux.ibm.com> writes:
>> If we hit UE at an instruction with a fixup entry, flag to
>> ignore the event and set nip to continue execution at the
>> fixup entry.
> You don't explain why we would want to do that. Or what the consequences
> are if we *don't* do it.
>
> As such it's unclear if this is an important fix or just a nice-to-have.

We want avoid panic if we hit MCE during memcpy from pmem devices because
the system is still recoverable and should just result -EIO, So we flag it here
to ignore the UE event. I will respin with better commit message.

>> For powernv these changes are already made by
>> commit 895e3dceeb97 ("powerpc/mce: Handle UE event for memcpy_mcsafe")
> We have masses of code that supposedly abstracts the MCE logic. How did
> we end up in the situation where we're having to write the same fix
> twice for different platforms?

What is common between pseries and powernv now is saving the MCE event for deferred
handling and deferred handling. According to me it becomes bit messy to return
disposition(UE RECOVERED) from common code. So what we can have is a common function
which searches the exception table entry and updates nip with fixup address, And call
it from different places for pseries and powernv. If you are ok ill spin next version.

> next
>
> cheers
>
>> Reviewed-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
>> Reviewed-by: Santosh S <santosh@fossix.org>
>> Signed-off-by: Ganesh Goudar <ganeshgr@linux.ibm.com>
>> ---
>> V2: Fixes a trivial checkpatch error in commit msg.
>> V3: Use proper subject prefix.
>> ---
>>   arch/powerpc/platforms/pseries/ras.c | 8 ++++++++
>>   1 file changed, 8 insertions(+)
>>
>> diff --git a/arch/powerpc/platforms/pseries/ras.c b/arch/powerpc/platforms/pseries/ras.c
>> index 43710b69e09e..58e2483fbb1a 100644
>> --- a/arch/powerpc/platforms/pseries/ras.c
>> +++ b/arch/powerpc/platforms/pseries/ras.c
>> @@ -10,6 +10,7 @@
>>   #include <linux/fs.h>
>>   #include <linux/reboot.h>
>>   #include <linux/irq_work.h>
>> +#include <linux/extable.h>
>>   
>>   #include <asm/machdep.h>
>>   #include <asm/rtas.h>
>> @@ -505,6 +506,7 @@ static int mce_handle_error(struct pt_regs *regs, struct rtas_error_log *errp)
>>   	int initiator = rtas_error_initiator(errp);
>>   	int severity = rtas_error_severity(errp);
>>   	u8 error_type, err_sub_type;
>> +	const struct exception_table_entry *entry;
>>   
>>   	if (initiator == RTAS_INITIATOR_UNKNOWN)
>>   		mce_err.initiator = MCE_INITIATOR_UNKNOWN;
>> @@ -558,6 +560,12 @@ static int mce_handle_error(struct pt_regs *regs, struct rtas_error_log *errp)
>>   	switch (mce_log->error_type) {
>>   	case MC_ERROR_TYPE_UE:
>>   		mce_err.error_type = MCE_ERROR_TYPE_UE;
>> +		entry = search_kernel_exception_table(regs->nip);
>> +		if (entry) {
>> +			mce_err.ignore_event = true;
>> +			regs->nip = extable_fixup(entry);
>> +			disposition = RTAS_DISP_FULLY_RECOVERED;
>> +		}
>>   		switch (err_sub_type) {
>>   		case MC_ERROR_UE_IFETCH:
>>   			mce_err.u.ue_error_type = MCE_UE_ERROR_IFETCH;
>> -- 
>> 2.17.2
diff mbox series

Patch

diff --git a/arch/powerpc/platforms/pseries/ras.c b/arch/powerpc/platforms/pseries/ras.c
index 43710b69e09e..58e2483fbb1a 100644
--- a/arch/powerpc/platforms/pseries/ras.c
+++ b/arch/powerpc/platforms/pseries/ras.c
@@ -10,6 +10,7 @@ 
 #include <linux/fs.h>
 #include <linux/reboot.h>
 #include <linux/irq_work.h>
+#include <linux/extable.h>
 
 #include <asm/machdep.h>
 #include <asm/rtas.h>
@@ -505,6 +506,7 @@  static int mce_handle_error(struct pt_regs *regs, struct rtas_error_log *errp)
 	int initiator = rtas_error_initiator(errp);
 	int severity = rtas_error_severity(errp);
 	u8 error_type, err_sub_type;
+	const struct exception_table_entry *entry;
 
 	if (initiator == RTAS_INITIATOR_UNKNOWN)
 		mce_err.initiator = MCE_INITIATOR_UNKNOWN;
@@ -558,6 +560,12 @@  static int mce_handle_error(struct pt_regs *regs, struct rtas_error_log *errp)
 	switch (mce_log->error_type) {
 	case MC_ERROR_TYPE_UE:
 		mce_err.error_type = MCE_ERROR_TYPE_UE;
+		entry = search_kernel_exception_table(regs->nip);
+		if (entry) {
+			mce_err.ignore_event = true;
+			regs->nip = extable_fixup(entry);
+			disposition = RTAS_DISP_FULLY_RECOVERED;
+		}
 		switch (err_sub_type) {
 		case MC_ERROR_UE_IFETCH:
 			mce_err.u.ue_error_type = MCE_UE_ERROR_IFETCH;